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ABSTRACT 


The World Wide Web (WWW) has beeome a major souree of easily aceessible 
information for students, professionals, researchers and the general public. However, the 
volume of information available through the Web is so overwhelming that it is not 
unusual to get tens of thousands of "hits" when conducting a relatively simple search. 
Most existing search techniques use brute force based on keyword matches to find related 
Web pages. While the enormous speed of search engines improves the efficiency of such 
methods, effectiveness is not improved. 

The objective of this thesis is to construct and test an ontology-based application 
to help users identify the most pertinent keywords for a search. By navigating ontologies 
that describe domains of interest, users are assisted in finding a relevant set of key terms 
that will aid the search engines in narrowing, widening, or refocusing a Web search. 
Specifically, the thesis develops an ontology-aided Web search assistant prototype to help 
users enhance the relevance and precision of the returned results through the use of a 
context provided by ontologies associated with each search. 
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EXECUTIVE SUMMARY 


The World Wide Web (WWW) is an easily accessible source of information for 
students, professionals, researchers, as well as the general public. However, the volume 
of information available through the Web is so overwhelming that it is not unusual to get 
tens of thousands of "hits" when conducting a relatively simple search. Improving Web 
searches has, therefore, become an important and potentially lucrative area of research 
and development. Since the current Web lacks embedded semantics which allow 
machines to decipher the true content of the Web pages, as Tim Berners-Lee envisions 
for the “Semantic Web”, most Web search portals use brute force techniques based on 
keyword matches mapped to indexed Web content. While the enormous speed of search 
engines improves the efficiency of such methods, effectiveness is not improved. 

The objective of this thesis is to construct an ontology-based application to help 
users identify the most pertinent keywords for a search. By navigating ontologies that 
describe domains of interest, users are assisted in expanding the relevant set of key terms 
that will aid the search engines in narrowing or refocusing a Web search. Specifically, 
the thesis develops an ontology-aided Web search assistant prototype to help users 
enhance the relevance and precision of the returned results through the use of context 
provided by ontologies. An experiment to measure whether the developed application 
benefits the end user is also conducted as part of this thesis. 

This thesis is organized as follows. Chapter II is dedicated to understanding the 
semantics and syntax of RDF and OWL. Although many ontology development tools 
hide the details of OWL syntax, an ontology developer must have a good understanding 
of the language in order to construct a valid ontology. Chapter III describes the proposed 
methodology for building an ontology. It details a seven-step approach for developing 
OWL based ontologies. A geography domain is used to illustrate OWL constructs and 
for developing a complete geography ontology for use in the OAKDA application. 
Chapter IV discusses the use of onotologies as knowledge bases in the OAKDA 
application. It specifically describes three example scenarios of using ontologies to 


XIX 



discover domain knowledge and ereate intelligent Web searches. Chapter V details the 
arehitecture and construction of the OAKDA application. This chapter describes in detail 
the components used to build the prototype applieation and how they communicate with 
eaeh other within the architecture of OAKDA. Chapter 6 tests the main thesis research 
question: Can an ontology-based Web seareh application increase the effeetiveness of 
Web seareh results over existing approaehes for those searches that require a deep 
contextual knowledge of the domain of interest? The experiment eompares the 
effeetiveness of the results of Web seareh queries formulated by study partieipants using 
the OAKDA applieation with those obtained by the same partieipants using the widely 
popular Google search engine. The results show that OAKDA and the ontologies which 
it implements have a statistieally signifieant positive effect on the precision of web search 
queries generated by the participants in the experiment. Finally Chapter 7 concludes the 
thesis by summarizing our effort to develop a tool to help users improve their Web 
searches. The chapter discusses the application's effectiveness, the lessons learned, and 
possible future enhancements. 
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I. INTRODUCTION 


A. BACKGROUND 

The World Wide Web (WWW) is an easily aeeessible source of information for 
students, professionals, researchers, as well as the general public. However, the volume 
of information available through the Web is so overwhelming that it is not unusual to get 
tens of thousands of "hits" when conducting a relatively simple search. Improving Web 
searches has, therefore, become an important and potentially lucrative area of research 
and development. Since the current Web lacks, for the most part, embedded semantics 
which allows machines to decipher the true content of the Web pages, as Tim Berners- 
Lee envisions for the “Semantic Web”, most existing search techniques must use brute 
force based on keyword matches to find related Web pages. While the enormous speed of 
search engines improves the efficiency of such methods, effectiveness is not improved. 

We define an effective search as one returning to the user a manageable set of 
highly relevant results. More formally, an effective search returns a results set with a 
high degree of “aboutness.” Aboutness is broadly defined as a degree in which a set of 
returned resources is “about” a particular domain of interest. For instance, “if a system 
determines that a document d is topically related (i.e. about) to query q, then the 
document is returned to the user.” [Bruza et ah, 2000, 1] Unfortunately, the current 
technology’s inability to identify the true context of Web sites makes it difficult to 
determine the aboutness of any resource. However, the aboutness of Web search results 
can be improved if there is an appropriate set of search terms that narrows the query 
match results to the most relevant sets of domain resources. 

Broad keywords or one-word searches will often result in a large number of hits, 
many of them irrelevant. Conversely, using the appropriate keyword(s) will result in 
high degree of aboutness in the returned set, therefore resulting in a highly effective 
search. In order to take advantage of currently available search engines to return user- 
specific relevant results, an intelligently crafted list of keywords and phrases needs to be 
formed. To develop such a list, it is necessary to have sufficient knowledge of the 
domain of context. Unless the user is a subject matter expert, finding the relevant and 
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related terms of a domain may be diffieult or even erroneous. An ontology, whieh is 
model of a domain of eontext, ean fill the role of a domain expert and support the 
identifieation of preeise and relevant keywords. 


B, OBJECTIVES 

As diseussed, an effeetive means for obtaining more effective Web search results 
is to have a comprehensive understanding or knowledge of the context of the search 
term(s). The availability of a domain knowledge in conjunction with the search terms 
would be extremely useful for determining an appropriate set of search terms, thus 
retrieving the most appropriate Web resources. In order to gain contextual knowledge, 
ontologies, defined as "specification of conceptualization," [Gruber, 1993, 1] can be 
mined to discover relevant information about the domain of interest. Specifically, this 
thesis refers to an ontology as a system of knowledge representation that allows machines 
and humans to understand the definitions of concepts and the relationships between them. 
The authors propose ontologies as an appropriate knowledge representation system that 
can be used to create search queries with a greater aboutness value. 

The objective of this thesis is to construct an ontology-based application to help 
users identify the most pertinent keywords for a search. By navigating ontologies that 
describe domains of interest, users are assisted in finding a relevant set of key terms that 
will aid the search engines in narrowing, widening, or refocusing a Web search. 
Specifically, the thesis develops an ontology-aided Web search assistant prototype to help 
users enhance the relevance and precision of the returned results through the use of a 
context provided by ontologies associated with each search. 

C. THE RESEARCH QUESTION 

The primary research question of this thesis is: Can an ontology-based 
application be built to narrow, expand, refine and increase precision of Web search 
terms ? In order the address the primary question, the thesis will also address several 
secondary questions. First, what is an appropriate approach for accessing and 

processing of contextual information of an OWL knowledge base? Second, what is the 
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most appropriate architecture for the prototype application? Third, how can an ontology 
inference engine interface with the application? Last, is there a method of visually 
rendering the ontologies for greater usability, navigation and comprehension of domain 
knowledge ? 

The ability to answer these questions will depend largely on finding and determining the 
necessary software components and building the interface capabilities between them. 

The challenge will be to learn the necessary technologies and skills to develop the most 
appropriate architecture and implementation. 

D, SCOPE, LIMITATIONS AND ASSUMPTIONS 

The scope of this thesis is to research and understand ontologies, their 
development languages and methodologies, build, and test an ontology-based Web 
application that aids users in designing search term lists. While the realization of the 
Semantic Web would allow search engines or agents to process all Web resources based 
on their content rather than as string of characters, this vision requires redevelopment of 
all Web sites currently available. The application proposes an interim solution for Web 
search by employing ontologies as its knowledge system for discovery and search results 
enhancement using current Web technologies. 

The success of the application is contingent on the availability and reliability of 
ontologies for domain knowledge representation. Although at the time of this thesis, 
libraries of ontologies are still limited, the prevalence and popularity of ontologies is 
growing in various fields, both in academia and commerce. It is the authors' belief that 
applications, such as the one developed here, further encourage the growth and validity of 
ontologies and suggest a usable architecture for building applications which operate in 
this milieu. 

It is assumed that the readers have a working knowledge of HTML and XML, as 
well as have a general understanding of software systems. Readers are also assumed to 
be familiar with Web search engines, such as Google and Yahoo!, and how Web search is 
performed using one or more terms. Furthermore, it is the authors' argument that 
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ontologies' contextual domain knowledge provides the user with a selection of domain¬ 
relevant search terms which narrows and improves the search results. 


E, METHODOLOGY 

The methodology used in the development of this thesis consists of two parts. 

The first part is dedicated to the research and understanding of ontologies, languages, and 
methodologies for ontology development. This is achieved by first understanding and 
mastering existing semantic languages for the Web, namely Resource Description 
Framework (RDF) and Web Ontology Language (OWL). Second, a review of literature 
on ontologies is conducted, within and outside the vision of the Semantic Web, 
emphasizing their important contribution to knowledge representation in information 
technology. Third, existing development methodologies are adapted to construct OWL 
ontologies, and a seven-step methodology for constructing valid OWL ontologies is 
proposed. Fourth, a sample ontology is built to demonstrate the semantics of RDF and 
OWL, as well as the proposed ontology development methodology. Last, examples are 
demonstrated to show how ontologies can be used to discover knowledge and aid the 
construction of relevant search terms. 

The second part of the methodology used in this thesis is related to the 
development of the prototype Ontology Aided Knowledge Discovery Assistant 
(OAKDA), pronounced "Oak D-A", application that will assist users with discovering 
domain knowledge and designing their Web search terms for the most relevant results. It 
is best described as a prototype software development methodology. First, a multi-tier 
architecture is designed for the application. Second, a reasoning engine is incorporated, 
using TCP/IP interface model, to inference ontologies. Third, ontology concepts and 
relationships are graphically displayed and can be traversed in any direction. Last, a user 
function to select relevant terms from the ontology and build a search term list is 
constructed and used to query the Google Web search portal and return matched results. 
In order to test the research question, this thesis will also examine whether ontology- 
aided Web searches perform better than those relying only on a search engines such as 
Google. While it is difficult to measure the accuracy or aboutness of search results, the 
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research design will measure the answer precision and content-relatedness of the returned 
Web pages of the test questions in order to determine the performance of the ontology- 
aided search tool developed in this thesis. 


F. ORGANIZATION OF THE STUDY 

This thesis is organized as follows. Chapter II is dedicated to understanding the 
semantics and syntax of RDF and OWL. Although many ontology development tools 
hide the details of OWL syntax, an ontology developer must have a good understanding 
of the language in order to construct a valid ontology. Chapter 3 describes the proposed 
methodology of building an ontology. It details a seven-step approach for developing an 
OWL ontology. A geography domain is used to illustrate OWL constructs and for 
developing a complete geography ontology for use in the OAKDA application. Chapter 
4 discusses the use of ontologies as knowledge bases in the OAKDA application. It 
specifically describes three example scenarios of using ontologies to discover domain 
knowledge and create intelligent Web searches. Chapter 5 details the architecture and 
construction of the OAKDA application. This chapter describe in detail the components 
used to build the prototype application and how they communicate with each other within 
the architecture of OAKDA. Chapter 6 tests the thesis research question by comparing 
the precision and content-relatedness of the ontology-aided searches to those performed 
solely by search engines. Finally Chapter 7 concludes the thesis by summarizing our 
effort to develop a tool to help users improve their Web searches. The chapter discusses 
the application's effectiveness, the lessons learned, and possible future enhancements to 
the application. 
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II. RDF AND OWL ONTOLOGIES OVERVIEW 


A. INTRODUCTION 

An ontology can be defined as a formal explieit deseription of eoneepts in a 
domain of diseourse, properties of eaeh eoneept describing various features and attributes 
of the eoneept, and restrietions on these properties that are speeified by semanties, or 
rules, that follows the “rules” of the domain of knowledge. For these reasons, ontologies 
are useful as knowledge bases (KB) for an applieation attempting to add eontext to a 
partieular seareh word or phrase. By navigating the ontologies, we ean understand the 
eontext of a partieular eoneept as well as the relationships it has with other eoneepts. 

This ehapter focuses on the syntax and eonstruet of ontology languages. The 
World Wide Web Consortium (W3C) has ereated a language speeifieally for ontologies, 
known as the Web Ontology Language (OWL), built upon previous Web languages 
ineluding the syntaetie foundation of the Web, XML, and Resource Deseription 
Framework (RDF)i. OWL is considered an extension of RDF, and as it will be evident in 
the discussion below, OWL ontologies use RDF syntax to define their resourees. 

The importanee of RDF and, in partieular, OWL to building ontologies is the 
ability to add semantics around their eoneepts that is not possible with XML. XML 
provides meta data to deseribe the eontent of its doeuments. However, the deseription 
alone does not explain anything about the relationship of the content; it is void of 
semanties. The goal of using ontologies is to move beyond the meta data to a more 
semantieally aware systems. RDF helps to aceomplish this by providing a meehanism for 
ereating meta data about resourees on the Web, i.e., any information that ean be retrieved 
or simply identified on the Web. RDF was developed for proeessing information by 
applieations, rather than simply displaying it for humans. It ereates a eommon 
framework for exehanging information aeross applieation without losing any of their 
meanings. However, RDF does not provide mueh eapability for semanties, leading to the 


1 For the purpose of this thesis, a working knowledge of XML will be assumed. While certain 
components of XML will be defined, detailed explanation of the syntax will not be discussed. 
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development of OWL .2 OWL is able to extend RDF with its semantic constructs, 
allowing it to define and instantiate concepts and their relationships within an ontology. 

Tim Berners-Lee’s vision of the Semantic Web is to move beyond the mere data 
presentation and meta data using HTML and XML, respectively. He argues that in order 
for the Semantic Web to work, systems need to have access to structured information 
along with a set of inference “rules” for processing automated reasoning [Bemers-Lee et 
ai, 2001, 2]. Bemers-Lee proposed a layered architecture for the Semantic Web as 
represented in Figure 1. As the figure shows, newer technologies stack on top of 
previous ones to achieve the realization of the Semantic Web. OWL was not part of the 
original stack because it was in its infant stages when Bemers-Lee proposed the 
architecture. However, as Figure 1 shows, OWL fits naturally between the RDF and 
Ontology layers. 


Trusted SW 


Proof 
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Rules/Query 
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RDF Model & Syntax 


XML Query 


XML Sefierraa 
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Figure 1. The Semantic Web Layers 


At the bottom of the stack are the simplest forms of web identifiers, the Uniform 
Resource Identifier (URI) and Unicode. URI provides a means for resources to be 


2 OWL was derived from an earlier ontology language developed by DARPA called DAML+OIL. 
More information on DAML+OIL can be found on www.w3.org/TR/daml+oil-reference . April 2005. 
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uniquely identified and retrieved on the web. Unieode is an extension of ASCII that has 
the ability to encode the characters of all languages, rather than only the Roman alphabet. 

The second layer is made up of XML and namespaces. XML is a set of syntax 
rules for structured documents. However, it does not enforce any semantic constraints on 
the documents. Namespaces are extensions of XML and they are a mechanism for 
differentiating elements and attributes of a particular vocabulary in order to make them 
globally unique. Namespaces allow different XML documents to be combined without 
ambiguity. 

The third layer of the stack is XML query (XQuery) and XML schema. As the 
name implies XQuery is a query language for XML documents. XQuery was designed to 
query XML-based data sources as one would do to databases. XML Schema is a 
definition language that limits the conforming XML documents to a specific vocabulary 
and hierarchical structure. The fourth layer consists of RDF model and syntax. RDF is 
an XML-based language used to describe objects or “resources,” as well as the 
relationship between them. This provides a data-model, known as RDF Triple, using 
simple semantics and the resources are accessed using URIs. RDF Schema is a language 
that describes RDF classes and properties, with semantics that specify the hierarchies of 
these classes and properties. 

The fifth layer is the ontology itself Ontologies are the key components to the 
Semantic Web because they contain the domain “knowledge” that defines concepts and 
their relationships to each other. Although RDF does have some capabilities to represent 
relationship between resources, it lacks the semantic richness and inference capacity that 
the OWL provides. OWL is built on RDF, adding greater vocabulary for specifying 
relationships between classes, individuals, property characteristics, and enumerated 
classes, that allows for developing rich ontologies. 

The next two layers, Rules/Query and Logic provide mechanisms for querying 
and inferencing to provide information about the domain of interest. Although 
description logic of OWL lacks full expressiveness, it does have computational 
completeness, which is not possible with RDF. 
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Little is currently understood about the top two layers of the Semantic Web stack, 
the Proof and Trust, but they are expected to be the focus of future development efforts. 
The idea behind Proof and Trust is that when information is retrieved from the Semantic 
Web, it must be proven and trusted as the right answer. For instance, if one source states 
that the country of Armenia is in the continent of Europe and another source states that it 
is in the continent of Asia, which source should be trusted? These inconsistencies would 
make it impossible for the success of the Semantic Web. However, as the focus shifts to 
proof checking mechanisms and digital signatures, these layers will provide a step closer 
to Berners-Lee’s goal for the next generation of the Web [Palmer, 2001, 11]. 

The remainder of this chapter addresses the syntax and constructs of RDF and 
OWL. It is organized as follows. Section B is a brief overview of RDF while section C 
discusses the syntax and constructs of OWL. 

B, RDF PRIMER 

RDF was developed to represent information about Web resources. The term 
“resource,” which has a broad meaning, can be simply thought of as the electronic file 
available out on the Web [Daconta et ah, 2003, 12]. Rather than just displaying 
information for human consumption, the RDF was developed to allow machines and 
applications to process information. Such an exchange of information is possible with 
the RDF common framework, which uses parsers and data processing mechanisms. The 
RDF capabilities to uniquely identify resources and share information across all 
platforms lay the core foundation for semantically richer languages, such as OWL. 

1, RDF Triple 

There are three necessary RDF components for identifying a piece of information. 
Known as the RDF triplQ, it consists of subject, predicate, and object. Consider a 
common English statement below. 

http.V/www.ontology.net/geography.html has a creator whose value is Ann Lee. 
The above statement is broken down into the RDF triple elements as follows. 
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1. SUBJECT: http://www.ontologv.net/geographv.html 

2. PREDICATE: creator 

3. OBJECT: Ann Lee 

In order for machines to understand the meaning or “knowledge” of this statement, the 
components of the English sentence must be formatted in such a way that they are 
machine-consumable. RDE accomplishes this by using what the URI references 
(URIrefs). URIrefs are RDF’s primary mechanism for specifying subject, predicate and 
object in its statements. An URIref has two parts, namely an URI joined with a fragment 
identifier. For example, http://www.ontologv.net/geographv.html#Countrv is a 
combination of the URI http://www.ontologv.net/geographv.html and the fragment 
identifier Country separated by the # sign. Unlike Uniform Resource Eocators (URLs), 
URIrefs do not require direct connection to an actual Web resource. 

The RDF triples are often depicted as a graph using nodes and arcs. For example, 
the English statement above is represented by the graph shown in Figure 2. 



http://purl.org/dc/elements/l.l/creator 



Figure 2. RDF Triple Model 


As the figure shows, the subject and object are represented by nodes, and the predicate by 
an arc from the subject to the object node. Also, the predicate and object are specified by 
an URIref, rather than simple values of “creator” and “Ann Lee”, as in the English 
statement above. The URIrefs uniquely associates any property or value to a particular 
resource identifier. That is, the usage of the property “creator” may have different 
meanings for different developers or applications. In order to clarify the exact meaning 
of “creator” as implied by the developer, the term is associated with an URIref to make 
the description unambiguous to the user. Therefore, depicting the predicate as 
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“ http://purl.Org/dc/elements/l ■ 1/creator ” helps distinguish it from other meanings of 
“creator,” such as the one with resource of “ http://www.anotheruser.org/term/creator .” 
Furthermore, the use of URIref to identify the property makes it possible to augment 
additional information. For example, the object URIref of a particular RDF statement 
may be used as a subject of another RDF statement. 

It is important to understand that one resource may be part of several RDF 
statements. Consider the diagram below. 


httD://www.ontoloav.net/aeoaraDhv. 



http://purl.ora/dc/ elements/1.1/cre http://so: nesite.ora/creationD 

http://purl.ora/dc/gtCTnents/1.1/cre 


htto://www.ontoloav.net/staffid/123 


February 10, 2005 


Figure 3. Multiple RDF Statements Interconnected 


Figure 3 illustrates how multiple statements interconnect with one another, and provide 
multiple layers of information for a given node. Each arc corresponds with a RDF triple. 
Thus, Figure 3 represents four separate RDF triple statements. Also, even though literals 
may not be used as subjects or predicates, objects may take on a constant value. The node 
http://www.testsite.net/pic.ipeg, representing a picture file, has for its creation date the 
literal value of “February 10, 2005.” 

Like XML, RDL uses namespaces to abbreviate for URIs. Lor example, the 
prefix myont is the namespace representation for the URI 

http://www.ontologv.net/geographv# . Future references to this resource, then, may be 
written as myont :, followed by the fragment identifier such as myont: Country. 
Some common namespace prefixes are as listed in Table 1. 
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Prefix 

URI 

rdf: 

http://www.w3.Org/1999/02/22-rdf-svntax-ns# 

rdf s : 

http://www.w3.Org/2000/01/rdf-schema# 

dc: 

http://purl. 0 rg/dc/elements/l. 1/ 

owl: 

http://www.w3.Org/2002/07/owl# 

xsd: 

http://www.w3.Org/2001/XMLSchema# 


Table 1. List of Common Namespaces 


Within the RDF namespace, there are defined constructs that denote different 
relationship semantics. For instance, the property rdf : type is used to define the 
various kinds of relationships that exist between resources. This is especially useful with 
RDF Schema specification vocabularies of Class and subClassOf , where 
rdf : type is analogous to the instanceOf property in object-oriented languages. 
RDF constructs, such as rdf : type, will be discussed further in the sections below. 

2, RDF Schema 

RDF Schema, the semantic extension of RDF, allows for the development of an 
application-neutral vocabulary for defining class and subclass hierarchies, as well as 
properties to describe these classes. Unlike object-oriented languages that RDF Schema 
is often compared to, the RDF vocabulary defines properties in terms of resource classes. 
That is, one can create new properties out of existing ones simply by adding to the 
original property specifications without redefining the original description and 
restrictions of the class. This is the benefit of RDF Schema's property-centricity, having 
the ability to extend the existing resource descriptions. 

The facilities of RDF Schema are predefined with its own set of RDF vocabulary 
under the resource, http://www.w3.Org/2000/01/rdf-schema# . The vocabulary may be 
referenced using the rdf s namespace as specified above. The RDF examples below will 
use the rdf s and rdf (URI: http://www.w3.org/1999/02/22-rdf-svntax-ns#) 
namespaces. 

3, Classes 

RDF Schema categorizes similar “kinds of things” into classes. Syntactically, a 
class in RDF Schema is any resource that has the rdf : type property value of 
rdf s : class. The members of a class, known as instances, are also known as class 
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extensions. Multiple elasses may have the same set of instanee, whieh have all sets of 
properties from eaeh elass. Furthermore, a elass may be defined by a set of its own 
extensions and/or extensions of other elasses. 

A Subclass is a ehild elass, whieh inherits all the eharaeteristies from its parent 
elasses. If elass B is a subelass of elass A, then all the instanees of B are also instanees of 
A. The superelass-subelass relationship is ealled an "is-a" relationship, meaning an 
instanee of elass B is also an instanee of elass A. If elass C is also a subelass of elass A, 
then elass C is a sibling elass to elass B. They share all the same traits inherited from 
elass A, along with their unique properties that differentiate them. The syntax 
rdf s : subClassOf is used to define a subelass. Table 2 lists elass relevant RDF 
eonstruets. 


RDF Construct 

Description 

rdf:type 

Speeifies that a given resouree is an instanee of some elass. 

rdfs:subClassOf 

Speeifies that all the instanees of one elass are also instanees 
of another elass. 

rdfs:label 

Used to provide human-readable names for the resourees. 

rdfs:comment 

Used to provide human-readable deseriptions of the 
resourees. 


Table 2. RDF Class Construets 


4 Properties 

RDF property is the predicate relationship between the subjeet and objeet 
resourees. All RDF properties have the rdf : type value of rdf s : Property. Like 
elasses, properties are arranged in hierarehies where rdf s : subPropertyOf eonstruet 
denotes a taxonomie, superelass-subelass relationship. Thus, if the property Y is a 
subproperty of X, then all resourees related to property Y are also related to property X. 
There are three RDF Sehema syntax used to define properties. The range of a property, 
speeified by rdf s : range, indieates the property’s allowed set of values. The 
property’s domain, rdf s : domain, is used to show that the property is applied to a 
designated elass or set of elasses. These property semanties, further defined in Table 3, 
are used to deseribe RDF properties. 
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RDF Construct 

Description 

rdfs:subPropertyOf 

Specifies that all the properties are also subproperties of 
another property. 

rdfs:range 

Specifies the class instance or a literal that a given property 
must take one as its value. 

rdfs:domain 

Specifies the class that “owns” the property. That is, it 
associates the property with the class it modifies and 
asserts that the subjects of such property statements must 
belong to the instance of the class. 


Tables. RDF Property Constructs 


5, Other RDF Vocabularies 

RDF containers are predefined syntax used to represent collections of resources. 
There are three container vocabularies in RDF Schema, namely rdf : Bag, rdf : Seq, 
and rdf : Alt, which are specified under the class rdf s : Container. The bag 
(rdf : Bag) defines a group of resources or literals where the order of its members is not 
significant. The sequence (rdf : Seq) is a group resources or literals where the order of 
them are, in fact, relevant, whether it is alphabetical, numeric, or other types of ordering. 
The alternative (rdf : Alt) is a group of resources or literals that are “alternatives” to 
the other containers. That is, all the members are alternates of one another. All the 
containers described above allow duplicate members in its list. 

RDF collection differs from the RDF containers in that it is a closed list. In other 
words, it is an exhaustive list of members, exclusive to other possible candidates. In 
specifying the RDF collection, the syntax rdf : List describes the list while 
rdfifirst property refers to the first-member relationship and rdf : rest property 
refers to the rest-of-list relationship. 

Putting all the RDF Schema vocabularies together, here is an example of an RDF 
document in Figure 4. 
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<?xml version="l.0"?> 

<rdf:RDF xmlns: rdf=http ://www.w3.org/1999/02/22-rdf-syntax-ns# 
xmlns:rdf s=http ://www/w3/org/2000/01/rdf-schema# 
xml: base=http ://www/geodesy.org/water/naturally-occurring> 

<rdfs:Class rdf:ID="River"> 

<rdfs:subClassOf rdf:resource="#Stream"/> 

</rdfs:Class> 

<rdfs:Class rdf:ID="Stream"> 

<rdfs:subClassOf rdf:resource="#NaturallyOccurringWaterSource"/> 
</rdfs:Class> 

<rdf:Property rdf:ID="emptiesInto"> 

<rdfs:domain rdf:resource="#River"/> 

<rdfs:range rdf:resource="#BodyOfWater"/> 

</rdfs:Property> 

<rdf:Property rdf:ID="hasLength"> 

<rdfs:domain rdf:resource="#River"/> 

<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf- 
schema#Literal"/> 

</rdfs:Property> 

</rdf:RDF> 


The RDF statement in Figure 4 defines both elasses and properties. The elasses are 
River and Stream identified by the rdf : ID syntax, where River is a subelass of 
Stream and Stream is a subelass of NatuallyOccurringWaterSource, whieh 
is defined elsewhere in the RDF doeument. Also, there are two properties, 
emptiesinto and hasLength, specified by rdf : ID. While both properties have a 
domain value of class River, the range of emptiesinto is the BodyOfWater class 
and the range of hasLength is a literal string. Thus, instances of River having these 
properties must select an instance of BodyOfWater as the value for emptiesinto 
property and a literal string for the value of hasLength property. 

Although RDF provides certain semantics for knowledge representation for 
systems, it still lacks the semantic richness for creating meaningful ontologies and 
capability for inferences. To this end OWL was developed to address the semantic 
limitations of RDF. 
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C. WEB ONTOLOGY LANGUAGE (OWL): BUILDING AN ONTOLOGY 

OWL builds on RDF and RDF Schema, with enhanced eapability to deseribe 
elasses and properties. Like RDF, OWL uses URIs and deseription framework for a wide 
distribution aeross systems, neeessary sealability for the web, eompatible Web standards, 
extensibility and openness. OWL’s rieh semanties provide better interpretability than 
XML, RDF, and RDF Sehema, making it ideal as a platform independent, maehine- 
proeess language. OWL is the most appropriate language available to express explieitly 
the meaning of terms in a domain as well as relationships between them. 

There are three OWL sublanguages, namely OWL Lite, OWL DL, and OWL Full, 
in inereasingly expressive sequenee. 

OWL Lite, the least expressive of the sublanguage, is used primarily to support 
elassifieation hierarehy and simple restrietion features. And, while OWL Lite allows 
eardinality restrietions, it is limited to the values of 0 and 1. This sublanguage is an ideal 
ehoiee for developing quiek and simple taxonomies. 

OWL-DL supports maximum expressiveness while maintaining the reasoning 
system’s eomputational eompleteness and deeidability. In other words, all inferenees are 
guaranteed for eomputation and these eomputations will be eompleted in a finite amount 
of time. With some exeeptions, sueh as type separation^, OWL-DL ineludes all 
eonstruets of the OWL language and it is derived from its eonformity to description 
logic. OWL-DL was speoifieally designed to allow logie inferencing and has ideal 
reasoning system eomputational properties. 

OWL Full allows maximum expressiveness and full eapability of the RDF syntax 
without the eomputational guarantees. An OWL Full elass may be treated as an 
individual and a eolleetion of individuals simultaneously. Also, OWL Full permits an 
ontology to add meanings of pre-defmed (RDF or OWL) voeabularies. However, most 
reasoning applieation do not support all the features of OWL Full in order to maintain its 
eomputation decidability and completeness. 


3 This is when class cannot take on the role of an individual or property; or, property cannot take on 
the role of a class or individual. 
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When choosing the most appropriate OWL ontology sublanguage, developers 
should evaluate the usage of the ontology. The need for expressiveness and 
computational simplicity usually determines the choice between OWL Lite and OWL 
DL. Meanwhile, the choice between OWL DL and OWL Full depends on the need for 
RDF Schema’s meta-modeling facilities. However, the developer should understand the 
trade-off between greater flexibility and reasoning capability when choosing the right 
sublanguage of OWL. 

For the purposes of this thesis and the Ontology Aided Knowledge Discovery 
Assistant (OAKDA) application developed in this thesis, OWL-DL will be the designated 
sublanguage. Its computational completeness and decidability are crucial to the ontology 
development and use for the OAKDA application. However, most of the OWL class 
syntax and constructs discussed below applies to all sublanguages of OWL. It is in the 
description of OWL properties, restrictions, and complex classes where the focus will 
shift to OWL-DL. Unless specified otherwise, reference to OWL should be assumed to 
mean OWL-DL. 

Throughout this chapter and the next, many examples will be used to illustrate the 
syntax, constructs and methodology for building ontologies. The goal at the end of these 
two chapters is to develop a sample ontology that will be used in the OAKDA 
application. For the purpose of this thesis, the authors chose geography as the ontology 
domain of context. The main motivation is its familiarity and a relatively high general 
interest in the topic by many readers. 

Numerous GUI based ontology editors exist for developing ontologies. These 
editors simplify ontology development by generating the OWL code from the graphical 
specification. In particular, Protege4 is an ontology editor that ontology developers can 
use without the knowledge of the syntax and constructs of OWL. Although Protege does 
hide the details of OWL, it is crucial for all OWL-DL ontology developers to understand 
the constructs and semantics of OWL. Protege will be used and referenced throughout 
this thesis to illustrate and visually display examples of the Geography OWL ontology. 

4 Protege is an open-source application developed at Stanford University. More information is 
available at http://protege.Stanford.edu/index.html . May 2004. 
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1, Namespaces and Ontology Headers 

In developing an OWL ontology, it is typieal to begin the doeument by stating the 
set of vocabularies that will be used throughout the ontology. This is done by declaring 
namespaces. Figure 5 shows namespace declarations for the Geography ontology. 


<rdf:RDF 

xmlns="http://a.com/ontology/Geography#" 

xmlns:protege="http://protege.Stanford.edu/plugins/owl/protege#" 

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 

xmlns:xsd="http://www.w3.org/2000/10/XMLSchema#" 

xmlns:owl="http://www.w3.org/2002/07/owl#"> 

Figure 5. Sample Namespace Declaration 


The first is the default namespace, referring to the Geography ontology itself. 

The next three namespaces are W3C’s predefined vocabularies. While the OWL 
constructs are defined under http://www.w3.org/2002/07/owl, and since OWL is built on 
RDF, RDF Schema, and XML Schema, all three URLs are listed as necessary 
namespaces for building an OWL ontology. 

Once the namespaces are listed, a set of assertions for the ontology can be 
grouped under the tag, owl: Ontology. This is the place to list the meta-data 
information about the ontology. Table 4 shows meta data constructs commonly used 
under the owl: Ontology heading. 
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Construct 

Description 

rdf:about 

Specifies a name or reference of the ontology. If 
the value of then the default value is the base 
URI which contains the ontology. 

rdfs:comment 

Provides a place for comments and annotations 
regarding the ontology. 

owl:priorVersion 

Provides a capability for a version control system. 

owl:imports 

Allows importing of other ontologies, with all of 
their assertions, in the current ontology. This 
should be used in coordination with the 
namespace, which allows the references to the 
imported ontologies. 

owl:AnnotationProperty 

Provides to declare properties that are used as 
annotations. 

rdfs:label 

Allows natural language labels for the ontology. 


Table 4. Meta Data Construets 


Example of these meta data construets is demonstrated in the OWL statement in Figure 6. 


<owl:Ontology rdf:about=""> 

<rdfs:comment>An example OWL ontology</rdfs:comment> 

<owl:priorVersion> 

<owl:Ontology rdf:about=" 
http://a.com/ontology/02102004/Geography"/> 

</owl:priorVersion> 

<owl:imports 

rdf:resource="http://www/geodesy.org/water/naturally- 
occurring"/> 

<rdfs:comment>Developed as part of the OWL Ontology Development 
Tutorial by Ann Lee and Edward Powers. 

</rdfs:comment> 

<rdfs:label>Geography Ontology</rdfs:label> 

</owl:Ontology> 


Figure 6. Example OWL Meta Data 

2, Basic Components of OWL 

The basic components of OWL are classes, properties, individuals, and the 
relationship between the individuals. These components are discussed in detail in the 
following sections. 

In OWL, classes may be defined in a variety of ways. Class description, as 
termed by the W3C, allows six methods for describing a class: 
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1. A class identifier (URI referenee) 

2. A eomplete list of individuals that eombined to form instanees of a elass 
(enumeration) 

3. A property restriction 

4. An intersection of two or more class descriptions 

5. An union of two or more elass deseriptions 

6. A eomplement of a elass deseription 

The six types of elass definition will be discussed in detail in the seetions below. 

3, Defining OWL Classes 

Similar to RDF, the most basie eomponent of OWL is the elass. In creating an 
ontology, classes are the meehanism for abstraeting groups of resourees that share similar 
eharaeteristies. An OWL elass is assoeiated with a set of individuals, known as the class 
extension, whieh are instanees of the elass. It is erueial to distinguish between a elass and 
its elass extension. Although they are related to each other, they are not equal. Thus, two 
different elasses ean have the same instanees without eonfliet. Those instanees will have 
eharaeteristies of both classes. 

The most straightforward method of defining a elass is by speeifying a name, 
represented syntaetieally using an URI. All OWL elasses belong to a superelass 
owl: Thing. In other words, all the user-defined elasses are subelasses of 
owl: Thing. 

For example, in the Geography ontology, the elasses Ocean, Mountain, 
and Country are defined in OWL as stated in Figure 7. 


<owl:Class rdf:ID="Ocean"/> 

<owl:Class rdf:ID="Mountain"/> 
<owl:Class rdf:ID="Country"/> 


Figure 7. OWL Class Definition by Name 


Here, the elasses Ocean, Mountain, and Country are the named elasses. 
Class definitions are speeified with owl: Class and the rdf : ID identifies the name, 
owl: Class eonstruet is a subelass of rdf s : Class, whieh an additional deseription 
logic components. Within the ontology doeument, references to these elasses are stated 

5 In OWL Full, these two statements, owl: Class and rdf s : Class, are equivalent. 
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as #Ocean, #Mountain, and #Country. OWL uses the RDF Sehema syntax 
rdf : ID to introduee the elass name as part of the class definition. 

The basic component of building a taxonomy of classes is the 
rdf s : subClassOf construct. As in a tree structure, this syntax relates the specific 
subclass to the general superclass. Thus, all the instances of the subclass are instances of 
the superclass. This relationship is transitive so that if class B is a subclass of class A and 
class C is a subclass of class B, then class C is a subclass of A. The OWL statements in 
Figure 8 exemplify the subclass relationship. 

<owl:Class rdf:ID="Mountain"/> 

<rdfs:subClassOf rdf:resource="#BodyOfLand"/> 

</owl:Class> 

<owl:Class rdf:ID="Volcano"/> 

<rdfs:subClassOf rdf:resource="#Mountain"/> 

</owl:Class> 

Figure 8. Basic Subclass Specification 

The examples in Figure 8 define two OWL classes. Mountain and Volcano. 

It explicitly states that class Mountain is a subclass of BodyOfLand and class 
Volcano is a subclass of Mountain. Therefore, by rules of subsumption. Volcano is 
also a subclass of BodyOf Land, inheriting all the characteristics of BodyOf Land as 
well as additional properties of Mountain. 

There are two components to a class definition. The first part is the name 
declaration, or reference, and the second part is the list of class description or restriction. 
Furthermore, subclasses inherit the properties and their restriction from their parent 
classes. And every restriction specified as part of the class definition further confines the 
instances of that class. In other words, the individuals are the instances of the 
intersection of all the restrictions of the class. Thus, in the Geography example. Volcano 
is bound by all the properties of the Mountain and BodyOfLand classes in addition to its 
own set of restrictions. 
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a. Disjoint Classes ( disjoints it h) 

When classes are disjointed from one another, they cannot share the same 
individuals. In other words, if the classes Ocean and Lake are disjointed from each 
other, they cannot have an individual that is a member of both classes. 


<owl:Class rdf:ID="LandlockedCountry"> 

<rdfs:subClassOf rdf:resource="Country"> 

<owl:disjointWith rdf:resource="CoastalCountry"/> 

<owl:disjointWith rdf:resource="IslandCountry"/> 

<owl:dis jointWith rdf:resource="PeninsulaCountry"/> 
<owl:dis jointWith rdf:resource="ArchipelagoCountry"/> 
</owl:Class> 

Figure 9. Disjoint Classes 


The OWL statement in Figure 9 specifies that the LandlockedCountry class 
is disjoint from all the classes listed in the statement. However, this does not assert that 
the listed classes are disjointed from each other. That is, by this description, 
CoastalCountry is not disjointed with IslandCountry. In order to state mutual 
disjointed relationships, every class must be asserted with the owl: dis jointWith 
relationship. 

4, Individuals 

As briefly discussed, the members or instances of classes are referred to as 
individuals. There are two ways of defining an individual in OWL. The two statements 
are identical in meaning. 


<Ocean rdf:ID="PacificOcean"/> 

Or 

<owl:Thing rdf:ID="PacificOcean"/> 

<owl:Thing rdf:about="#PacificOcean"/> 
<rdf:type rdf:resource="#Ocean"/> 

</owl:Thing> 


Figure 10. Instantiating OWL Individuals 


The first method of declaring an individual is by simple instantiating the class as 
in the first statement in Figure 10. The second method uses rdf : type, like in RDF, to 
link an individual to its class and it is a two-part statement. It should be clarified that 
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these two statements do not need to be adjacent. In fact, they do not even need to be part 
of the same ontology. Since Web ontologies are designed for distribution, they can be 
augmented or imported within other ontologies. Thus, the instantiation and use of 
individuals can occur in two separate ontology documents. 

5. Properties 

Properties assert general information about a class and specify concrete 
information about individuals of that class. The sections below discuss the different 
types of properties and how they are used as class restrictions. 
a. Defining Properties 

There are two types of properties, namely datatype and object properties. 
Datatype properties specify the relationships between individuals and RDF literals or 
XML Schema datatypes. Object properties represent the relationships between 
individuals of one or more classes. Several methods can be used to create a property 
relationship, whether it is datatype or object property. The most common method to 
specify an object property is limiting the domain and range of the property to individuals 
of certain classes. In Figure 11, the hasBoundary property has a range restriction of 
all the individuals of the BodyOf Land class and a domain restriction of all the 
individuals of BodyOfWater class. 


<owl:ObjectProperty rdf:ID="hasBoundary"> 

<rdfs:range rdf:resource="#BodyOfLand"/> 
<rdfs:domain rdf:resource="#BodyOfWater"/> 
</owl:ObjectProperty> 

<owl:ObjectProperty rdf:about="#hasBorder"> 
<rdfs:domain rdf:resource="#BodyOfLand"/> 
</owl:ObjectProperty> 


Figure 11. Property Restriction Using Domain and Range 


Unlike the hasBounday property, the hasBorder property in Figure 
11 shows only a domain restriction. By not explicitly stating the range restriction within 
the property definition, it is implied that all individuals, regardless of class will have the 
the default range owl: Thing. 
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Similar to classes, properties may be organized as a hierarehy. Likewise, 
a property may be defined as a subproperty, or a specialization, of another property. 
Figure 12 shows examples of the property subsumption using the eonstruet 

rdfs:subPropertyOf. 


<owl:ObjectProperty rdf:ID="hasCountryDescriptor"> 

<rdfs:domain rdf:resource="#Country"/> 

<rdfs:range rdf:resource="#CountryDescriptor"/> 

</owl:ObjectProperty> 

<owl:ObjectProperty rdf:ID="hasOfficialLanguage"> 

<rdfs:range rdf:resource="#Language"/> 

<rdfs:subPropertyOf rdf:resource="#hasCountryDescriptor"/> 
</owl:ObjectProperty> 

figure iz. Property subsumption bxampies 


Using rdf s : subPropertyOf, the hasOf f icialLanguage objeet 
property is a type of hasCountryDescriptor property, inheriting all of the parent 
property's eharaeteristies. The definition of hasCountryDescriptor is defined by 
the domain and range restrietions. Although the hasOff icialLanguage property 
inherits these restrietions from the parent property, by explieitly assigning a new range, 
the Language elass, it further restriets the possible individuals that ean fill the value of 
this property. Therefore, the range of this property is not merely the instanees of the 
Count ryDescript or elass, as speeified in the parent property, it is further eonfined 
to the instanees of the Language elass. 

Using property restrietions, it is now possible to expand on the simple 
elass definition. The elass Coast alCountry, for example, ineludes a property 
restrietion as part of its definition. 
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<owl:Class rdf:ID="CoastalCountry"> 

<rdfs:subClassOf rdf:resource="#Country"/> 

<rdfs:subClassOf> 

<owl:Restriotion> 

<owl:onProperty rdf:resource="#hasCoastline"/> 

<owl:minCardinality rdf:datatype="&xsd;nonNegativeInteger">l 
</owl:minCardinality> 

</owl:Restriotion> 

</rdfs:subClassOf> 

</owl:Class> 

Figure 13. Property Restriction in Class Description 


Consider the restriction listed under the second rdf s : subClassOf 
syntax in Figure 13. This subclass declaration is of an unnamed class, or an anonymous 
class, used as part of the CoastalCountry class definition. Under the anonymous 
class description, the restriction uses the owl: minCardinality to state that the 
individuals of this class must have at least one associated hasCoast line property 
value. Thus, the complete definition of CoastalCountry states that all individuals of 
this class are an instance of the Country class and meet the minimum cardinality 
restriction of the hasCoast line property. More will be discussed about the 
cardinality property in Section 7.b. of this chapter. 

b. Properties and Datatypes 

Datatype property values range between RDF literals (rdf s : Literal) 
and XML Schema datatypes. Table 5 is the list of XML Schema types used with OWL 
datatype properties. 
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xsd:string 

xsd:normalizedString 

xsd:boolean 

xsd:decimal 

xsd:float 

xsd:double 

xsd:integer 

xsd:nonNegativeInteger 

xsd:postiveInteger 

xsd:nonPositiveInteger 

xsd:negativeInteger 

xsd:unsignedByte 

xsd:long 

xsd:int 

xsd:short 

xsd:unsignedLong 

xsd:unsignedint 

xsd:unsignedShort 

xsd:hexBinary 

xsd:base64Binary 


xsd:dateTime 

xsd:time 

xsd:date 

xsd:gYear 

xsd:gMonthDay 

xsd:gDay 

xsd:byte 

xsd:gYearMonth 

xsd:gMonth 

xsd:anyURI 

xsd:token 

xsd:language 

xsd:NMTOKEN 

xsd:Name 

xsd:NCName 


Table 5. XML Schema Datatypes 


6, Property Characteristics 

In addition to domains and ranges, other OWL constructs allow greater semantic 
expressiveness. These OWL property characteristics provide the means for classification 
and reasoning on the ontology. 

a. Transitive and Symmetric Properties 

OWL properties can take on transitive attributes. If a property is defined 
as transitive to another property, the values of the property may have inferred 
relationships with one another. Mathematically, a transitive property is expressed as: 

P(a,b) and P(b,c) implies P(a,c) 

In the Geography ontology, locatedin is a transitive property. Figure 14 shows how 
transitive property is defined in OWL. 


<owl:ObjectProperty rdf:ID="locatedIn"> 

<rdfs:type rdf:resource="&owl;TransitiveProperty"/> 
<rdfs:domain rdf:resource="&owl;Thing"/> 

<rdfs:range rdf:resource="#PoliticalGeography"/> 

</owl:ObjectProperty> 

<Region rdf:ID="VaticanCity"/> 

<locatedIn rdf:resource="#Rome"/> 

</Country> 

<Region rdf:ID="Rome"/> 

<locatedIn rdf:resource="#Italy"/> 

</Country> 


Figure 14. Transitive Property Defined 
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The OWL statements in Figure 14 explieitly state that the individual 
VaticanCity is loeated in individual Rome and that individual Rome is loeated in 
individual Italy. Since locatedin is defined as a transitive property, it is inferred 
that VaticanCity is located in the region of Italy. 

Another OWL property construct is the symmetric attribute. If a property 
is designated as symmetric, then for any values a and b, there is the following 
relationship: 


P(a,b), iff P(b,a) 

In the Geography ontology, ad jacentTo is a symmetric property. Figure 15 shows 
how symmetric property is defined in OWL. 


<owl:ObjectProperty rdf:ID="adjacentCountry"> 

<rdfs:type rdf:resource="&owl;SymmetricProperty"/> 
<rdfs:domain rdf:resource="#Country"/> 

<rdfs:range rdf:resource="#Country"/> 

</owl:ObjectProperty> 

<Country rdf:ID="Nigeria"/> 

<locatedIn rdf:resource="#Africa"/> 

<adjacentCountry rdf:resource="#Cameroon"/> 

</Country> 


Figure 15. Symmetric Property Defined 


The OWL statements in Figure 15 explicitly state that the individual Nigeria 
has the ad jacentCountry property value of the individual Cameroon. However, 
because ad jacentCountry is a symmetric property, it can is inferred that Cameroon 
has the ad jacentCountry property value of Nigeria. In order for a property to be 
symmetric, it must have the same domain and range value. It is invalid to have a 
symmetrical relationship between two individuals that belong to different classes. 
b. Functional Property 

An OWL functional property states that there is only one value associated 
with the property. If an individual designates more than one value of a functional 
property attribute, then it is assumed that those values are the same. Mathematically, a 
functional property is stated as follows: 
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P(a,b) and P(a,c) implies a = c 

In the Geography ontology, hasCapitalCity is a functional property. Figure 16 
shows how a functional property is defined in OWL. 


<owl:ObjectProperty rdf:ID="hasCapitalCity"> 

<rdfs:type rdf:resource="&owl;FunctionalProperty"/> 
<rdfs:domain rdf:resource="#Country"/> 

<rdfs:range rdf:resource="#CapitalCity"/> 

</owl:ObjectProperty> 


Figure 16. Functional Property Defined 


For every individual of the Country class, there can only be one value 
associated with the hasCapitalCity property. Thus, for the country individual 
UnitedStates, if its hasCapitalCity property is filled with WashingtonDC 
and DistrictOf Columbia, it can be assumed that these two values are the identical. 

c. InverseOf Property 

If a property is defined with the owl: InverseOf construct of another 
property, then they have the following relationship: 

Pi(a,b) iff P 2 (b,a) 

The domain and range determines the direction of the property. That is, the property 
states the relationship from the domain, the subject, to the range, the object. However, it 
is often necessary to define another property that states the relationship in the opposite 
direction. Therefore, the “inverse of’ property inverses the domain and range of the 
initial property so that the domain becomes the new range and the range becomes the new 
domain. Figure 17 shows how two properties have an "inverse of relationship using the 
owl: InverseOf construct. 
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<owl:ObjectProperty rdf:ID="hasCapitalCity"> 

<rdfs:type rdf:resource="&owl;FunctionalProperty"/> 

<rdfs:domain rdf:resource="#Country"/> 

<rdfs:range rdf:resource="#CapitalCity"/> 

</owl:ObjectPropsrty> 

<owl:ObjectProperty rdf:ID="belongsToCountry"> 

<owl:inverseOf rdf:resource="#hasCapitalCity "/> 

<rdfs:domain rdf:resource="#CapitalCity "/> 

<rdfs:range rdf:resource="#Country"/> 

</owl:ObjectPropsrty> 


Figure 17. InverseOf Property Defined 


The OWL statements in Figure 17 indicated that the property 
belongsToCountry is an inverse property of hasCapitalCity. Hence, if the individual 
Madrid has a belongsToCountry relationship with individual Spain, then by the rules of 
the "inverse of property, Spain will automatically have a hasCapitalCity relationship 
with Madrid. Also notice the domain and range of the owhinverseOf properties. The 
domain of one property is the range of another, and vise versa. 

d. Inverse Functional Property 

The inverse functional property combines the traits of the “inverse of’ and 
functional properties. It indicates that it has an inverse of relationship with another 
property, which must be a functional property. 

The properties hasCapitalCity and belongsToCountry, shown 
in Figure 16, are in fact inverse functional properties. Since hasCapitalCity is a 
functional property, its inverse must be an myQXSQ functional property. Mathematically, 
an inverse functional property is expressed as follows: 

P(a,b) and P(c,b) implies a=c 

In the Geography ontology, the belongsToCountry property is defined as inverse 
functional property, shown in Figure 18. 
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<owl:ObjectProperty rdf:ID="belongsToCountry"> 

<rdfs:type rdf:resource="&owl;InverseFunctionalProperty"/> 
<owl:inverseOf rdf:resource="#hasCapitalCity "/> 

<rdfs:domain rdf:resource="#CapitalCity "/> 

<rdfs:range rdf:resource="#Country"/> 

</owl:ObjectProperty> 


Figure 18. Inverse Functional Property Defined 


If the individual WashingtonDC has a belongsToCountry relationship 
with UnitedStated, and DistrictOf Columbia also has a 
belongsToCountry relationship with UnitedStated, then by the rules of inverse 
functional properties, WashingtonDC and DistrictOf Columbia are identical 
objects. 

An inverse functional property is similar to a unique key in a database. 

The domain and range of the inverse functional properties create a unique identifier 
combination, where one element of the domain is always associated with a particular 
domain of the range. 

7. Property Restrictions 

In addition to the variety of property characteristics discussed above, classes may 
use property restrictions as part of their description. The restrictions are defined by the 
syntax owl: Restrictions and owl: onProperty. 

a. someValuesFrom and allValuesFrom Restrictions 
Although the domain and range restrictions of a property apply to all 
classes using that property, a class definition may further confine the property value at 
the local level. The restriction syntax owl: someValuesFrom states that for every 
instance of the class using that particular property, the values of the property must have at 
least one individual of the class specified by the owl: someValuesFrom clause. 

Figure 19 shows a class definition of IslandCountry and the use of 
owl: someValuesFrom to limit the hasLandType property value. 
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<owl:Class rdf:ID="IslandCountry"> 

<rdfs:subClassOf rdf:resource="#Country"/> 

<rdfs:subClassOf> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#hasLandType"/> 
<owl:someValuesFrom rdf:resource="#Island"/> 
</owl:Restriction> 

</rdfs:subClassOf> 

</owl:Class> 


Figure 19. owFsomeValuesFrom Example 


The owl: someValuesFrom restriction states that the individuals of 
IslandCountry with the property hasLandType value must have at least one 
individual belonging to the Island class. This restriction on the property only applies 
to this class and its subclasses and not other that use the hasLandType property, such 
as its sibling class LandlockedCountry. 

The owl: allValuesFrom restriction states that j/an instance of a 
class has any value for this restricted property, they all must be a value from the specified 
class of individuals. Unlike the owl: someValuesFrom restriction, which requires 
the property to have at least one value, the owl: allValuesFrom restriction allows the 
property to have a null value. The owl: allValuesFrom is often used in conjunction 
with owl: someValuesFrom as a closure axiom for a property restriction. If the 
developer's intention is to restrict the property value to only individuals a certain class, 
rather than at least one value, then owl: allValuesFrom should be used with 
owl:someValuesFrom. 
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<owl:Class rdf:ID="IslandCountry"> 

<rdfs:subClassOf rdf:resource="#Country"/> 

<rdfs:subClassOf> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#hasLandType"/> 

<owl:someValuesFrom rdf:resource="#Island"/> 

<owl:allvaluesFrom rdf:resource="#Island"/> 

</owl:Restriction> 

</rdfs:subClassOf> 

</owl:Class> 

Figure 20. owFsomeValuesFrom & owFallValuesFrom Example 


Figure 20 shows a elass deseription using both owl: someValuesFrom and 
owl: allValuesFrom restrietions. This implies that all individuals of 
IslandCountry with the hasLandType property value ean only have values of 
Island individuals. 

b. Cardinality Restriction 

Cardinality specifies the minimum, maximum, or the exact number of 
values in a property relationship. Unless cardinality is specified, it is assumed that there 
is an unlimited possible property values. Cardinality is important when the class 
description is based on a specific number of property attributes. For example, when 
defining a bi-coastal area, the class description must specify that it borders the ocean on 
two sides of the land. Likewise minimum and maximum cardinalities put a specific 
restriction on a class. 

Figure 21 shows a Geography class BicoastalCountry definition. 


<owl:Class rdf:ID="BicoastalCountry"> 

<rdfs:subClassOf rdf:resource="#CoastalCountry"/> 

<rdfs:subClassOf> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#bordersOcean"/> 

<owl:cardinality rdf:datatype="&xsd;nonNegativelnteger">2 
</owl:cardinality> 

</owl:Restriction> 

</rdfs:subClassOf> 

</owl:Class> 

Figure 21. Cardinality Example 
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Based on the OWL statement, the owl: cardinality constructs restrict the 


class to have two bordersOcean property values. Therefore, all individuals of 
BicoastalCountry using this property must specify exactly two property values. 

Likewise, owl:maxCardinality and owl:minCardinality set 

the upper and lower bounds of the property cardinality. Used in combination, they limit 
the property to a numeric range. 

c. owUhasValue Restriction 

The owl: hasValue construct restricts the class definition by specifying 
the exact value of the specified property. Hence, individuals of the class must have at 
least one of its property values equal to the owl: hasValue restriction. 


<owl:Class rdf:ID="Lake"> 

<rdfs:subClassOf rdf:resource="#BodyOfWater"/> 
<rdfs:subClassOf> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#hasSaline"/> 
<owl:hasValue> 

<SaltContent rdf:ID="NotSalty"/> 

</owl:hasValue> 

</owl:Restriction> 

</rdfs:subClassOf> 

</owl:Class> 


Figure 22. owhhasValue Example 


Figure 22 is a class definition of Lake, which specifies that its hasSaline must 
have a property value ofNotSalty. This declares that all individuals of Lake must 
have at least one hasSaline property value that equals NotSaltyto satisfy this class 
requirement. Similar to the allValuesFrom and someValuesFrom, this restriction 
is only applied to the local class. 

d. Equivalent Classes and Properties 

The owl: equivalentClass indicates that two classes have exactly 
the same class extensions or set of individuals. This construct has a variety of use. First, 
when adopting multiple ontologies, it is used to map one class to another if they are 
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identical with different class names. Second, it provides another method for defining 
classes. The owl: equivalentClass allows classes to be defined by a set of 
restrictions. 


<owl:Class rdf:ID="BodyOfWater"> 

<owl:equivalentClass> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#consistsOf"/> 
<owl:someValuesFrom rdf:resource="#Water"/> 

<owl:allvaluesFrom rdf:resource="#Water"/> 

</owl:Restriction> 

</owl:equivalentClass> 

</owl:Class> 

Figure 23. owl:equivalentClass Example 


The OWL statements of Figure 23 indicate that the definition of BodyOfWater 
class is exactly equivalent to all the classes that meet the restriction requirement that the 
consist sOf property value is only individuals of the Water class. Although this 
description may be used with therdfsisubClassOf, as discussed above, that would 
have a different implication. The restriction description under rdfsisubClassOf 
states a necessary condition while owl: equivalentClass goes a step further to 
create a necessary and sufficient condition. Thus, if the OWL statement in Figure 23 
used rdfsisubClassOf, it would imply that individuals that has the consistOf 
property value of water may or may not be the same as BodyOfWater individual; 
however, the description using owl: equivalentClass declares that all things that 
consists of water must be a BodyOfWater. The definitions of rdfs : subClassOf 
and owl: equivalentClass are stated in Table 6. 


Relationship 

Implication 

rdfs:subClassOf 

BodyOfWater(a) implies consistsOf(a,b) & Water(b) 

owl:equivalentClass 

BodyOfWater(a) implies consistsOf(a,b) & Water(b) 
consistsOf(a,b) & Water(b)implies BodyOfWater(a) 


Table 6. Construct for Necessary vs. Necessary & Sufficient Conditions 
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Similarly, properties may also use the construct owl: equivalenProperty to 
link properties together. That is, any two properties tied with this syntax have exactly the 
same value, or property extension. 

e. Individual Equivalence Using owhsameAs 
Similar to the construct used for declaring that two classes are equivalent, 
the construct owhsameAs is used to declare that two individuals are equivalent.. Figure 
24 shows how the owhsameAs construct is used to indicate that the individual America 
is identical to the individual UnitedStatesOf America. 


<Country rdf:ID="America"> 

<owl:sameAs rdf:resource="#UnitedStatesOfAmerica"> 
</Country> 


Figure 24. owhsameAs Example 


This syntax is also useful when linking multiple ontologies, 
owl: sameAs construct allows equating individuals from different OWL documents. 

The fact that individual equivalence or distinction is made explicitly implies that OWL 
does not assume uniqueness based on name. The above OWL statement asserts 
equivalence between two individuals. However, the same assertion may be inferred 
using a functional property. Given that the hasCapitalCityisa functional property, 
as defined above, the statement in Ligure 25 states that the two individuals are equivalent. 


<Country rdf:ID="UnitedStates"> 

<hasCapitalCity rdf:resource="#DistrictOfColumbia"/> 
<hasCapitalCity rdf:resource="#WashingtonDC"/> 
</Country> 


Ligure 25. Individual Equivalence Using Lunctional Property 


Since hasCapitalCityisa functional property, it is simply inferred that 

DistrictOfColumbia is the same as WashingtonDC. 

/. Individuals Differences Using owhdifferentFrom & 
o whAllDifferent 

The inverse OWL construct of owl: sameAs is 
owl: dif f erentFrom. This construct is used to make individuals explicitly distinct 
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from one another. This is important when using individuals as property values. Figure 
26 illustrates a Geography example for the distinet individuals of Climate. 


<Climate rdf:ID="Dry"/> 

<Climate rdf:ID="TropicalHumid"/> 

<owl:differentFrom rdf:resource="#Dry"/> 

</Climate> 

<Climate rdf:ID="Highland"/> 

<owl:differentFrom rdf:resource="#Dry"/> 

<owl:differentFrom rdf:resource="#TropicalHumid"/> 
</Climate> 


Figure 26. owhdifferentFrom Example 


One way to assert distinction between individuals is to use the construct 
owl: dif f erentFrom. By making these individuals explicitly distinct, it ensures that 
the properties do not assume equivalence for these individuals. That is, if a functional 
property tries to fill its value with two explicitly distinct individuals, an error would be 
raised. In the Geography ontology, the functional property, hasClimate, cannot have 
both Dry and TropicalHumid individuals as values of one instance. Since these two 
individuals are explicitly unique, by the owl: dif f erentFrom construct, they cannot 
be made equivalent by a functional property. 

Another method for declaring individual distinction is by using the 

owl: AllDif f erent and owl: dist inctMembers constructs^. 


<owl:AllDifferent> 

<owl:distinctMembers rdf:parsetype="Collection"/> 
<Climate rdf:about="#Dry"/> 

<Climate rdf:about="#TropicalHumid"/> 

<Climate rdf:about="#Highland"/> 

</owl:distinctMembers> 

</owl:AllDifferent> 


Figure 27. owhAllDifferent & owhdistinctMembers Example 


6 The owl: dist inctMembers can only be used in combination with owl: AllDif f erent. 
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Figure 27 shows how to distinguish all the unique individuals in one OWL 
statement, rather than with eaeh individual deelaration. The statement in Figure 27 is 
semantieally identieal to the OWL statements in Figure 26. 

8, Complex Classes 

There are additional eonstructs used to ereate elasses or class expressions. Class 
expressions are nested elass descriptions, without the need for naming each “intermediate 
class” separately. Class expression allows for complex classes using set operations. This 
is done using anonymous classes or value restricted classes. Specifically, there are three 
types of set operations, namely union (owl: unionOf), intersection 
(owl: intersect ionOf), and complement (owl: complementOf). These 
constructs can be thought of as the “and”, “or”, and “not” logical operators. Another 
method of creating complex classes is by enumeration, where a class is described by 
exhaustively listing the individuals using the owlioneOf construct. 

a. Set Operators (intersectionOf, unionOf complentOf) 

Using the set operations as a class description is closer to a “definition” 
than other class description discussed thus far. That is, the membership of class is wholly 
determined by the set operation specification. The set operator “intersection of’, 
owl: intersect ionOf, describes a class with individuals that belong to all 
specifications listed under the class description. One example from the Geography 
ontology using the owl: intersectionOf set operation is the IslandCountry 
class. 


<owl:Class rdf:ID="IslandCountry"> 

<owl:intersectionOf rdf:parseType="Collection"> 
<owl:Class rdf:about = "Country" /> 

<owl:Restriction> 

<owl:onProperty rdf:resource="#hasLandType"/> 
<owl:someValuesFrom rdf:resource="#Island"/> 
<owl:allvaluesFrom rdf:resource="#Island"/> 
</owl:Restriction> 

</owl:intersectionOf> 

</owl:Class> 


Figure 28. owhintersectionOf Example 
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In Figure 28, IslandCountry is strictly the intersection of Country class and 
the set hasLandXype property values of the Island individuals. Therefore, all individuals 
of IslandCountry class must belong as extensions of both of these specifications. Notice 
the use of the syntax rdf;parsetype=”Collection” within OWL’s intersection syntax. As 
discussed in the RDF section, it is used to exhaustively list membership of a class. This 
construct will be used with the other set operations as well. 

The second set operation is the “union of’, with the owl: unionOf 
construct. Unlike owl: intersect ionOf, the owl lunionOf operator describes a 
class with individuals that belong to at least one of the specifications listed under the 
class description. 


<owl:Class rdf:ID="SaltyBodyOfWater"> 

<owl:unionOf rdf:parseType="Collect!on"> 
<owl:Class rdf:about="#Ocean"/> 
<owl:Class rdf:about="#Sea"/> 

</owl:unionOf> 

</owl:Class> 


Figure 29. owhunionOf Example 


In Figure 29, the rules on the union logic imply that the 
SaltyBodyOfWater class includes the individuals of both the Ocean and Sea classes. 
Since the union set operator is an “or” logic, the description above states that all the 
individuals of SaltyBodyOfWater are made up of individuals of Ocean or Sea. 

The third operator is owl: complementOf . This construct is the logic 
“not,” where the class describes all the individuals that do not belong to the specified 
class extension. 


<owl:Class rdf:ID="PhysicalGeography"/> 

<owl:Class rdf:ID="PoliticalGeography"> 

<owl:complementOf rdf:resource="#PhysicalGeography"/> 
</owl:Class> 


Figure 30. owlxomplementOf Example 
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In the OWL statement in Figure 31, the members of the 
Polit icalGeography include all individuals that are not members of the 
PhysicalGeography class. Since this construct can include a very large set of 
members, it is often used in combination with other operators, as in Figure 31. 


<owl:Class rdf:ID="NonSaltyBodyOfWater"> 

<owl:intersectionOf rdf:parseType="Collection"> 

<owl:Class rdf:about="#BodyOfWater"/> 

<owl:Class> 

<owl:complementOf rdf:resource="#SaltyBodyOfWater/"> 
</owl:Class> 

</owl:intersectionOf> 

</owl:Class> 


Figure 31. owlxomplementOf & owhintersectionOf Example 


In Figure 31, the NonSaltyBodyOfWater class is an intersection of two 
classes, one named and one anonymous. The class description, using two set operators, 
states that individuals of this class must be of both BodyOfWater and NOT a member of 
the SaltyBodyOfWater class. 

b. Enumerated Classes (oneOf) 

Another method of defining a class is by direct enumeration of all of its 
members or individuals. Using the owl: oneOf construct, the class is described by 
exhaustively listing all the individuals that make up the class. No other individual, other 
than those included in the list, can be a member of the class. 


<owl:Class rdf:ID="SaltContent " > 

<rdfs:subClassOf rdf:resource="PhysicalDescriptor"> 
<owl:oneOf rdf:parsetype="Collection"/> 

<SaltContent rdf:about="Salty"/> 

<SaltContent rdf:about="NotSalty"/> 

</owl:oneOf> 

</owl:Class> 


Figure 32. Enumeration Example 


In Eigure 32, the SaltContent class is defined by enumeration; listing all 
the members. Salty and NotSalty, of the class. In order for this definition to be valid, 
every individual must be declared correctly and they must all belong to a named class. It 
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is not required for the individuals to be deelared as a member of the elass being defined, 
although that is often the most logieal. 


D, CONCLUSION 

This chapter covered all the RDF and OWL semantics necessary to develop a 
OWL-DL ontology. Understanding these constructs is crucial to building a semantically 
rich ontology for the Web or other Web-related applications. Although there are OWL 
generating ontology editors available, such as Protege, without learning how and when 
the OWL semantics are used, it is not possible to build a valid ontology. This chapter 
should serve as the foundation to create a useful and meaningful ontology. 

Having gained an understanding of all the basic components of the RDF and 
OWL, the next step is learning the process for developing an ontology. The next chapter 
describes a methodology of ontology development. 
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III. ONTOLOGY DEVELOPMENT METHODOLOGY 


A. INTRODUCTION 

It is generally agreed upon that ontologies are the knowledge representation 
eomponent of the Semantic Web. Although the realization of the Semantic Web is still a 
distant goal, there is a growing interest for ontologies to be incorporated into current 
technologies. Many disciplines are seeing the immense value of ontologies as a way to 
codify a common set of information or knowledge to be shared across multiple 
applications. It provides users with a consistent and agreed-upon knowledge base that 
both humans and machines can process. While no ontology can model all the nuances of 
any domain area, it is possible and valuable to abstract the major concepts and how they 
relate to one another. A valid knowledge representation system that is widely used and 
shared could save effort for those who lack access to subject matter experts (SMEs). 
Likewise, SMEs are motivated to provide users and applications with basic domain 
knowledge through the development of ontologies, thus providing users with consistent 
sets of information that they can maintain and manage. 

Eollowing the discussion of OWL-DL constructs presented in the previous 
chapter, this chapter examines a methodology for developing an OWL-DL ontology. 

This process involves modeling the real world concepts and their relationships into OWL 
classes, properties and instances. Although ontology development may be easily 
understood, ontologies should be developed by SMEs or domain experts, particularly in 
complex or technical domain areas, such as biomedicine or aircraft components. This 
chapter addresses an approach of transferring domain knowledge into a valid ontology. It 
is important for ontology developers to understand that building a meaningful ontology is 
a highly iterative process. The greater the complexity of the domain and its scope, the 
more iterations or cycles of development is required. 

When an ontology is developed in OWL-DL, one can use a reasoning or an 
inference application to correctly classify the concepts of the domain based on their 
descriptions. As mentioned in Chapter 2, OWL-DL derives its semantics from 
description language, which is a description logic formalism for representing logical 
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meaning for reasoning computations. The capacity of inferencing is a highly valuable 
component, especially as the ontology grows in size and complexity. There are 
applications, such as CLASSIC and RacerPro, dedicated to processing the description 
logic languages. For ontology development in this chapter and the OAKDA application 
in the latter chapters, the RacerPro^ will be the reasoning and inferencing engine of 
choice for OWL-DL ontologies. 

Throughout this chapter, the authors use the Protege ontology development 
environment to illustrate the process of building an ontology. Although there are various 
OWL-DL ontology development environments. Protege provides a user-friendly plug-in 
for RacerPro and a graphic user interface (GUI) that hides the details of the OWL syntax 
from the developer. While it is crucial to understand the OWL-DL constructs in building 
a valid ontology, the purpose of this chapter is to understand the process and 
methodology rather than the syntax. Thus, the focus will shift away from the language 
constructs to the developmental steps and key concepts unique to developing an OWL- 
DL ontology. 

The rest of this chapter is dedicated to understanding the purpose and method of 
developing an ontology. Section B will discuss why ontologies are important and useful 
as a knowledge representation system. Section C surveys previous work on 
methodologies for building ontologies. Section D details a proposed seven-step 
development methodology using Geography as the domain of interest. Section E 
discusses other relevant considerations, such as exporting existing ontologies. 


B. THE PURPOSE OF BUILDING AN ONTOLOGY 

The word ontology has its origin in the philosophy discipline. Ontology is 
defined in the Merriam-Webster dictionary as “(1) A branch of metaphysics concerned 
with the nature and relations of being; (2) A particular theory about the nature of being or 
the kinds of existents.” [http://www.m-w.com/, January 2006] This definition pertains to 

a particular study of metaphysical philosophy concerning the nature of existence and its 

7 RacerPro is a DL inference system. RacerPro, which stands for Renamed ABox Concept Expression 
Reasoner, is a knowledge representation application using highly optimized tableau calculus for description 
logic expressions. It provides reasoning for multiple ABoxes and TBoxes. 
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experience. This concept is usually referred to as the “Big O” Ontology because it 
defines a named idea in philosophy. However, there is also what is known as the “little 
o” ontology, which began in the field of Knowledge Management and it is widely 
adopted in Information Technology and AI disciplines. Specifically, Thomas Gruber of 
the Knowledge Systems Laboratory at Stanford University first defined an ontology as 
“specification of conceptualization.” More explicitly, an ontology is “a description (like 
a formal specification of a program) of the concepts and relationships that can exist for an 
agent or a community of agents. This definition is consistent with the usage of ontology 
as set-of-concept-defmitions, but more general.” [Gruber, 1993, 199] Gruber’s definition 
is further expounded to state that the “little o” ontology has two parts, namely that (1) it 
describes and represents an area of knowledge, and (2) it defines “the common words and 
concepts of the description.” [Daconta et ah, 2003, 186] 

Using ontologies as a system of knowledge representation is an important 
contribution made by the field of Knowledge Management. Although there are other 
systems or framework for knowledge representation, ontologies provide a reusable, 
sharable and platform neutral knowledge construct. According to Gruber, many other 
knowledge systems are "isolated monoliths characterized by high internal coupling and a 
lack of external coupling interfaces that would enable the developer to reuse software 
tools and knowledge bases as modular components.” [Gruber, 1991, 1] Thus, he argues 
that the only possible method of sharing and reusing these knowledge bases is to import 
the knowledge representation system and its programming environment. However, using 
the software engineering approach of “decomposing” these indivisible systems, these 
knowledge bases should be broken into accumulable, sharable and reusable modular 
building blocks. Gruber states that three decomposition techniques are often used in AI 
for software development. These are using declarative knowledge representation, 
separate the knowledge from the program; identifying the classes and relationships 
inherent in the application-specific facts and reorganize the knowledge to allow 
inheritance from these constructs; and characterizing general problem solving tasks (i.e. 
classification) and inferencing classes (i.e. subsumption) and design corresponding 
algorithms and methods [Gruber, 1991, 2]. However, he further argues that these 
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strategies alone are insufficient to ensure sharability of the system. In addition to 
formalizing declarative knowledge, organizing class and relationship hierarchies, and 
characterizing tasks and inferences, Gruber introduces the need for canonical form of the 
declarative knowledge and common ontologies. First, “canonical form of the declarative 
knowledge” is a standard set of syntax and constructs or “semantics” that will be used as 
the knowledge representation language. Second, common ontologies are “vocabularies 
of representational terms - classes, relations, functions, object constraints - with agreed- 
upon definitions, in the form of human readable text and machine-enforceable, 
declarative constraints on their well-formed use.” [Gruber, 1991, 2] The canonical form 
and common ontologies are two necessarily elements that allow the knowledge 
representation systems to be shared and reused across multiple platforms. 

Based on Gruber's broad definition, an ontology can take on various forms. 

Types of an ontology may be as basic as a simple catalog, a finite list of terminology, and 
as semantically sophisticated as logical abstraction for disjointed and inverse 
relationships, shown in Figure 33 
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Figure 33. Ontology Spectrum 


The Ontology Spectrum [McGuiness, 2001, 3] diagram shows the progression of 
concept organization. As ontologies move from simple taxonomies to a structured 
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knowledge base with properties and restrictions, their need for expressiveness grows. 

The right side of the dotted line in Figure 33 indicates where OWL-DL constructs 
become relevant. At the far right end of the spectrum, OWL-DL becomes imperative as 
an ontology language. Furthermore, an inference engine uses OWL's description logic 
capability to verify consistency and completion. It mathematically checks for 
consistency and makes inference where it deems the relationships to be incomplete. 

These are crucial elements of a meaningful ontology because applications and systems 
rely on valid knowledge representation. The ontologies of concern in this thesis fall in 
the farthest end of the Ontology Spectrum of Figure 33. 

Given the wide range of ontologies, why are they important or even relevant as an 
information system? Natalya Noy and Deborah McGuinness argue five specific reasons 
for developing and using an ontology (Noy et. ah, 2002, 3). 

• Share a common information structure - By using an ontology that creates a 
common language amongst disparate systems, it becomes possible to share the 
same set of terms and concepts. This also allows agents to aggregate and extract 
information from other systems and use them appropriately to answer queries. 

• Reuse domain knowledge - Reuse is an important benefit of ontologies because it 
allows separate components to join together seamlessly. By integrating existing 
ontologies, developers can rely on a trusted source of domain expert of a 
particular component. Rather than recreating something from nothing, existing 
ontologies can integrate into complex knowledge bases. 

• Make domain assumption explicit - By incorporating domain assumptions into the 
ontology, rather than hard-coding it into a system, it makes it easier to manage 
and change them. It also helps “users” understand and learn the concepts and 
relationships within the domain. 

• Separate domain and operational knowledge - The ontology, representing the 
domain knowledge, is disconnected from the application, which represents the 
operational knowledge. This kind of low coupling is valuable in managing 
change in complex systems. 

• Analyze domain knowledge - The accessibility of ontologies makes it possible to 
analyze and validate domain knowledge. This is a key part of reusing and 
extending any ontology and is a valuable asset to developers and users alike. 


In spite of all these reasons, representing real-world concepts and complex 
relationships in simplified two-dimensional ontologies is not only difficult, but experts 
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find it insufficient to model all the intrieate relationships of a domain. Although OWL- 
DL has eonstruets rieher in semanties than its predeeessors sueh as RDF, it is still a 
modeling language for representing the real world eoneepts and relationships. Sinee 
semantie-riehness must be juxtaposed with eomputability, there is a eonfliet between 
“transpareney and predietability” and “rigor and eompleteness,” when eonsidering the 
design and development of an ontology [Reetor, 2004, 5]. That is, aeeording to Reetor, 
the SME’s preferenee for representing the domain in its praetieal way may not align with 
“logie and eomputational traetability.” Henee, the developer must eonstantly balanee the 
two perspeetives. The tradeoff is between the domain expert’s preferenee for rieh 
representation of reality and the logies of ealeulation for the sake of inferenee and 
olassifieation. 

The goal of the ontology development is not only to represent the domain 
eoneepts and properties, but also to ereate a doeument that ean be proeessed, ineluding 
inferenees, by maehines. Therefore, rather than trying to model all the eomponents of a 
partieular domain, it is preferred to limit to seope to a partieular area of interest or 
applieation. Before starting the development, the SME should first ask, how will the 
ontology be used? The answer to this question determines the seope of the ontology and 
the purpose of building it, whieh affeet the overall design and strueture of the ontology. 


C. METHODOLOGIES FOR ONTOLOGY DEVELOPMENT 

The growing number of ontologies developing in a variety of fields has lead to 
many proposed methodologies. All these different methods sprung from different 
domains and neeessities, and they all bring important lessons for future developers. This 
seetion will review some of the widely known and used methodology for ontology 
developments. 

1, Toronto Virtual Enterprise (TOVE) 

The first known methodology derives from the experienees of Toronto Virtual 
Enterprise (TOVE) ontology development. The TOVE methodology steps are as follows: 

1. Motivating Scenarios - These depiet the set of problems faeing an 
organization, whieh are deseribed in seenario stories or examples. 
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2. Informal Competency Questions - Based on the seenarios from Step 1, these 
are informal questions that the ontology should be able to answer. 

3. Terminology Specification - All the objeets and their relationships are defined 
at this step in first order logie. 

4. Formal Competency Questions - All the ontology terminologies are 
formalized. 

5. Axiom Specifications - All the axioms that speeify the objeet definitions and 
eonstraints are formally deseribed. Theses axioms must satisfy and answer 
the eompeteney questions stated in Step 4. 

6. Completeness Theorem - This is the evaluation stage of the development, 
where the ontology is tested to meet all the required eonditions. 

It is argued that the most interesting aspeet of the TOVE approaeh is its 
evaluation proeess using eompleteness theorem. The theorem are important for 
“assessing the extensibility of an ontology - any extension must be able to preserve the 
validity of the eompleteness theorems - or to provide a benehmark for ontologies” [Jones 
etal, 1998]. 

2. METHONTOLOGY 

Similar to TOVE, METHONTOEOGY foeuses on the assessment and 
maintenanee of the ontology. The major differenee between the two is that 
METHONTOEOGY foeuses mainly on the maintenanee within the life eyele whereas 
TOVE uses formal teehniques for addressing limited areas of maintenanee [Jones et ah, 
1998]. The seven steps of METHONTOEOGY are as follows: 

1. Specification - This step states the purpose of the ontology as well as the 
users, applieation, seope, and the required level of formality. The output of 
this step is a “natural-language ontology speeifieation doeument” [Gomez- 
Perez et ah, 1995]. 

2. Knowledge Acquisition - In parallel with Step 1, the developer finds the 
souree of the ontology domain knowledge in the form of interview with SMEs 
and analyses of literature. 

3. Conceptualization - The terms of the domain are speeified as eoneepts, 
instanees, verb relations or properties. 

4. Integration - Eor the sake of uniformity between different ontologies, 
speeifieations from other ontologies are eonsulted and ineorporated. 

5. Implementation - The ontology is developed into a formal language, sueh as 
OWE. 

6. Evaluation - Signifieant attention is paid to this step of the methodology, 
using different teehniques to determine the validity and verifieation of the 
ontology. A set of guidelines are used to seareh for ineompleteness, 
ineonsisteneies, and redundaneies. 

Documentation - All the steps and the ontology life eyeles are doeumented. 
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7. 



When the ontology is in the prototype phase, the emphasis is on the speeification, 
eonceptualization, formalization, integration and implementation steps of the lifecyele. 
However, after the ontology matures into the maintenanee phase, the developer must shift 
the focus to knowledge acquisition, evaluation and documentation of the ontology [Jones 
etal, 1998], 


3. KBSIIDEF5 

The IDEF5 methodology proposes a set of “guidelines” rather than systematic 
rules for developing an ontology. It suggests that ontology engineering is an open-ended 
process that should be constantly refined and updated. The IDEF5 guidelines are as 
follows: 

1. Organizing and Scoping - In the form of a purpose statement, this step 
establishes the objectives and context of the ontology that will be used as a 
“completion criteria.” 

2. Data Collection - The necessary data is collected using the knowledge 
acquisition techniques, such as expert interviews and protocol analysis. 

3. Data Analysis - This step analyzes the data collected in Step 2. All the 
domain’s objects of interests are identified and the boundaries of the ontology 
are defined. 

4. Initial Ontology Development - A draft of the ontology is developed using 
“proto-concepts,” which are preliminary specifications of objects, properties 
and relationships. 

5. Ontology Refinement and Validation - The “proto-concepts” from Step 4 are 
tested and refined through multiple iterations using deductive validation 
methods. 

Two representation languages assist in the continual refinement process of the 
IDEE methodology. The first is the schematic languages, which are graphical notations 
used mostly to facilitate the communication between the ontology developer and the 
domain expert. The second is the elaboration language, which is a more structured 
representation of the ontology objects and relationships [Jones et ah, 1998]. 

The three ontology development methods presented above are equally valid and 
bring important lessons to ontology developers. However, this thesis proposes a new 
ontology development methodology that is unique to developing an OWL ontology. The 
goal is to incorporate the different methods proposed above as well as provide a useful 
step-by-step guidance using OWL as the knowledge representation language of the 
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ontology. Similar to three methodologies reviewed above, the following methodology is 
derived from the lessons learned by the authors while developing the Geography 
ontology. 


D, THE STEPS TO DEVELOPING AN ONTOLOGY 

Three basic approaches are used for developing ontologies: top-down, bottom-up, 
or a combination approach. Although no one method is best, the combination approach, 
that starts with identifying the most obvious concepts and then incorporating the less 
salient concepts later, is better aligned with the iterative process recommended in 
building an ontology, as well as other complicated systems or applications. The ultimate 
goal is to follow a process leading to a good design and proper structure of the ontology. 
Regardless of how well planned, ontology development should be a cyclical and iterative 
process. It is recommended that the developer consider the structure of the ontology in 
multiple iterations, similar to the life cycle approach of software development. An 
example of the life cycle approach for software development is the Boehm’s spiral 
model, shown in Figure 34 (Boehm, 1998, 61). The model includes all the required 
phases of requirements, analysis, design, coding, testing and operations. 
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Figure 34. Boehm's Development Spiral 


The steps outlined below fit into this lifecycle phases. While this chapter outlines the 
first iteration of the process, developers should expect to follow multiple repetitions of 
the methodology. 

We propose an ontology development methodology that consists of seven steps. 
Similar to the spiral model, these steps are applied iteratively and the developers may 
find themselves going back to earlier steps to edit their initial work. The seven steps are 
as follows: 

1. Determine the scope and application of the ontology 

2. List relevant concepts of the domain 

3. Create the class hierarchy 

4. Define properties 

5. Describe classes using property restrictions and complex definitions 

6. Classify ontology with a reasoning tool 

7. Create individuals and fill property values 

Each of these steps will be discussed in detail in the sections that follow. These steps will 
be used to build the example Geography ontology presented in the previous chapter. The 
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choice of the Geography domain is based on the faet it is a eommonly understood domain 
and thus will help the reader understand the proeess of building an ontology.. 

1, Determine the Scope and Application of the Ontology 

This first step is the most important steps of the overall proeess and determines 
the final outeome of the ontology. The domain expert must understand well the purpose 
of the ontology. Often, the purpose of an ontology is two-folds. If an ontology is to 
represent the knowledge base of a partieular domain or segment of a domain, it will 
potentially funetion to “answer” all general questions relating to that domain. A seeond 
reason for developing an ontology is their use as knowledge representations in speeifie 
applieations. For a given ontology, the requirements for these two goals, to serve as a 
knowledge database for a speeifie applieation and as a generie knowledge representation 
model of a partieular domain, may be eonfiieting. Therefore, the developer must 
eompromise the demand for speeifieity and generality of scope in order to ereate a useful 
ontology. The developer should earefully manage the seope and depth to develop a 
realistie and eoherent ontology that serves the purpose of the applieation. 

To help developers determine the seope of a given ontology, a series of 
competency questions was developed by Gruninger et al. (Gruninger and Fox, 1995). 
These are questions that the ontology is expeeted to answer for the applieation on hand 
and, therefore, these questions ean help determine the seope of the ontology. The list 
should inelude broad and speeifie questions, aeting as the litmus tests to aseertain the 
neeessary level of detail. 

The seope and purpose of the Geography ontology is to define the basie physieal 
and politieal geographies and represent the relationships between them for the purpose of 
using it with OAKDA applieation, whieh will mine it to provide meaningful eontext to 
tailor user web searehes. It should represent the high-level understanding of geopolities - 
the physieal geographie eharaeteristies existing within different types of politieal entities. 
We will use this example ontology in the seetions that follow to demonstrate the 
development methodology of an OWL-DL ontology. 
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Example competency questions that are used to help determine the scope of the 
Geography ontology include the following: 

• Which non-democratic countries are landlocked? 

• What bodies of water border all bi-coastal countries in the northern 
hemisphere ? 

• What is the most common climate of island countries? 

• What river runs through the most countries ? 

• Which continent has the greatest number of coastal countries? 

These competency questions should be used to guide the development process 
and are applied repeatedly during the phases of the methodology as well as between 
iterations to ensure that the ontology fulfills its purpose. 

2. List Relevant Concepts of the Domain 

Once the scope is broadly defined, this step enumerates, in no particular order, the 
main concepts of the domain of interest. Although the final ontology may not necessarily 
include all the concepts defined during this phase, the developer should list as many 
relevant concepts as possible. At this point, one should not be concerned with 
overlapping concepts, the relationships between them, or their properties. The goal of 
this step is to create a comprehensive list of the concepts of the domain in preparation for 
the subsequent steps of development. 

Although as indicated, the properties and relationships of concepts should not be 
considered in this step, bearing in mind the main properties of concepts and how one 
concept relates to another could generate useful ideas of other related concepts. Another 
useful technique is to group related concepts into relevant categories. However, these 
categories should not be too narrow, which introduces difficulties further along in the 
process. It is important to note that this is still an informal stage of the development, 
where the SME should be more concerned about generating ideas rather than hard-coding 
specific concepts into categories. 

Eor the Geography example, the relevant concepts of the domain include the 
following: 
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ocean, sea, lake, river, mountain, land, plains, valley, desert, 
tropics, climate, country, government, city, boundary, 
continent, language, ethnicity, latitude, longitude, 
archipelago, coastline, Mexico, South America 

There is no particular order to specifying the list and the concepts were generated 
from the competency questions. This step provides the input for the next step, which is 
building the class hierarchy. While not every concept from this stage becomes a class, 
having a large pool of concepts relevant to the domain makes the hierarchy development 
easier. As in the requirements analysis for software development, the time and thought 
invested into the first two steps of the methodology provide great benefits and rewards 
during the subsequent steps of the methodology. 

3, Create the Class Hierarchy 

This step creates a class hierarchy by relating classes through the subsumption 
construct “is-a” relationship. An “is-a” relationship indicates that a member of a subclass 
is also a member of the superclass as shown in Figure35. 



Figure 35. Simple Class Flierarchy 


In the simple class hierarchy of Figure 35, classes ClassA, ClassF and 
ClassI are direct subclasses of owl : Things, which is the highest OWL-defined class 
of the hierarchy. Classes within the same level of the hierarchy are considered sibling 


8 As mentioned in the previous ehapter, all elasses in OWL-DL are subsumed under the parent 
elass of owfThing. This implies that all classes are consider subclasses of owhThing. This is an 
important concept to remember as the ontology incorporates properties and domain and range restrictions. 
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classes. Classes subsumed under other elasses are subclasses. All subelasses have a “is- 
a” relationship with their parent or superelass. Thus, members of Cl as sB are also 
members of ClassA aeeording to the definition of the is-a relationship between 
ClassA and ClassB. Similarly, members of ClassD and ClassE are also members 
of ClassB, whieh in turn are members of ClassA. It is important to have a firm 
understanding of this parent-ehild elass relationship to avoid problems with the later steps 
of the methodology. 

Organizing the elass hierarehy may be aeeomplished in several ways: top-down, 
bottom-up, or a eombination approaeh [Noy and MeGuiness, 2002, 6]. The top-down 
approaeh starts with the most general set of eoneepts and works down to the subsequent 
levels of speeialization. For example, the BodyOf Land and BodyOfWater classes are 
identified as the highest level of the Geography ontology hierarehy, and subsequent 
subelasses are subsumed under these two elasses. Thus, Ocean, River, and Lake are 
added as subelasses of BodyOfWater, and Mountain, Desert, and Plains as 
subelasses of BodyOf Land. The bottom-up approaeh starts with identifying the most 
speeifie elasses, then grouping them under a parent elass. In the Geography ontology 
example, the developer may start with LandlockedCountry, IslandCountry, and 
BiCoastalCountry elasses, whieh are then grouped as subelasses under the parent 
elass of Country. Similarly, lower level elasses sueh as DryClimate, 
PolarClimate, and TropicalHumidClimate are subsumed under the Climate 
parent elass. 

When grouping low-level eoneepts, developers should earefully differentiate 
between elasses and their instanees, known in OWL as individuals. A eareful 
examination of the list of eoneepts generated in step two of the methodology should help 
the developer differentiate classes from their instanees. The distinetion between a class 
and an individual is not always clear and often depends on the purpose of the ontology. 
This means that a concept that is a elass in one ontology may be more appropriately 
represented as an individual in another. However, elasses are generally “naturally 
oeeurring sets of things in a domain of diseourse” and individuals eorrespond to real- 
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world entities grouped under these elasses [Smith et ah, 2004, 19]. Classes represent a 
group of similar entities while individuals are the actual occurrences of these entities that 
make up the group. 

In the Geography ontology, Pacif icOcean is an instance of Ocean rather 
than its subclass since Pacif icOcean does not represent a group of entities but an 
actual entity itself. On the contrary, I s 1 andCount ry should be a subclass of 
Country rather than its instance since it represents a group of countries with the 
geographic landscape of an island, such as Ireland and Cuba. It is important to 
emphasize that the distinction between a class and an individual often depends on the 
purpose and scope of the ontology. 

The most common approach to organizing the ontology class hierarchy is the 
combination approach. This approach develops the class hierarchy by defining the most 
salient terms of the ontology, adding successive classes at the different levels of the 
hierarchy as appropriate. The advantage of the combination approach is that it allows the 
developer to start anywhere along the hierarchy and move up and down the stratum to 
add new classes as necessary. 

In the Geography ontology, two most salient classes are Country and Ocean. 
Using the combination approach, these two classes are defined at a top level of the 
hierarchy. Then, new concepts are added as parent classes or subclasses to these two 
initial set of classes. Furthermore, in line with the iterative development process, the 
hierarchy structure becomes refined as classes are moved from one position within the 
hierarchy to another. For example, as Figure 36 shows, in the first two iterations 
BodyOfWater and BodyOf Land classes occupied the top level of the hierarchy, as 
sibling to Polit icalGeography and Climate, however in the third iteration of the 
class hierarchy, the two classes fall under the parent class of PhysicalGeography. 
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Figure 36. Progression of the Class Hierarchy 


The development of the class hierarchy as described in this step falls under the 
“design” phase of the spiral development cycle. As the ontology evolves, the developer 
will revisit this step and modify the hierarchy as necessary. Additional requirements and 
knowledge acquired in the process refines the class taxonomy. In order to manage the 
constantly evolving ontology, detailed documentation and versioning is recommended. 


a. Disjointed Classes 

It is common for developers to make false assumptions about the 
relationship between OWL-DL classes. Specifically, developers presume that classes 
that do not share a superclass-subclass, or is-a, relationship are automatically disjoint. In 
OWL-DL, all classes are considered overlapping unless such separation or disjointness is 
made explicit. Specifying disjointness between classes requires an explicit specification 
using the OWL syntax owl: di s j ointwith. Only by defining a class as disjoint with 
others, the developer can assume class mutual exclusivity. 
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In the Geography ontology, disjointness between elasses Ocean, 
Mountain, and Country is not assumed. In other words, aceording to the eurrent 
specifieation. Ocean and Mountain classes can share the same individuals. In order to 
make classes disjoint from one another, disjointness must be specified explicitly using the 

owl: dis jointwith statement. 



Figure 37. Disjointed Classes 


Figure 37 indicates in Protege that the selected Bay class is disjointed 
from all of its sibling classes, as listed in the bottom right corner box. Without this 
explicit restriction, OWL does not exclude an individual from belonging to more than one 
class. Also, the disjoint restriction applies to all the subclasses under the specified class. 
By making BodyOfWater disjoint with BodyOf Land, all the subclasses of 
BodyOfWater are disjoint from all the subclasses of BodyOf Land. 

4, Define the Properties 

After defining the class hierarchy, the next step is to specify the class properties. 
Classes, without any properties or restrictions, have no useful meaning other than how 
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they relate to each other in the taxonomy. A property, which is defined at the class level, 
represents the relationship between two individuals, or between an individual and a literal 
string value. As discussed in the previous chapter, depending on whether it is an object 
property or datatype property, the range value of the property is either an OWL 
individual object or a literal string. Properties describe the relationships or links between 
real-world objects or values. Properties are the verb that link the subject and object in the 
RDF triples as explained in Chapter Two. Although there are properties that are unique 
to a given class, it is more common and recommended that properties be defined 
generically and be applied to classes as appropriate. That is, a property often applies to 
more than one class. However, property restrictions can be used to limit the applicability, 
using the domain and range specification. This will be discussed in more detail in Section 
5.C that defines the use of domain and ranges. 

Although properties may be used generically throughout an ontology, the 
developer should start defining them based on the characteristics of the classes. One way 
of thinking about these characteristics is the verb-object that applies to the class. For 
instance, for the class Parent, the most obvious property is “has child.” Likewise, the 
most salient property of the Child class is “has Parent.” Similar to the technique used 
to define classes, developers should start with the most obvious characteristics of a class 
and iteratively add, change, and refine these characteristics. 

In the Geography example, the characteristics of the Country class include “has 

border”, “has population”, “has capitaf’, “has language”, “has climate”, “has river”, “has 

lake”, “has mountain”, “ has government”, “has ethnic group”, and others. As the list 

shows, most these characteristics relate to other classe instances within the ontology. The 

Country class has a hasCapital property to denote the relationship it has with a 

capital city. Although the verbs can be arbitrary and are often specified at the discretion 

of the developer, it is advisable to use the most straightforward and direct description of 

the relationship. For the class characteristics that relate it to a data type, the property 

depicts the class’s relationship to a data string value. The property hasPopulat ion 

describes the link between the individuals of class Country and their population value. 

In this case, population is a numeric value that represents the number of people in a 
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particular country. Figure 38 shows the graphieal representation of the differenee 
between objeet oroperty and datatype property. 


hasCapital 



Italy Rome 

Object Property 


hasPopulation 



Sri Lanka "20,064,776" 

Datatype Property 


Figure 38. Two Types of Properties 


The objeet property hasCapital denotes the relationship between the individuals of 
Country and the individuals of CapitalCity, which are both OWL objeets. The 
datatype property hasPopulat ion, however, links the individuals of Country to a 
unique data string value, whieh in this case is “2 0, 0 64,77 6.” 

A partial list of properties for the Geography ontology is shown in Figure 39. 


ITdll Properties [g} [gl ^ [gf 

I □ I adjacentTo 
lOl consistsOf 

IdI containsPhysicalGeogi aphy 
lOl flowsliTto 
lOl hasBoi del 
lOl hasBoundai y 
I QI hasCapital i;' apitnIOf 
151 hasCity 
lOl hasCllmate 
lOl hasCoastline 
lOl hasCountry 
► I DI hasCountryDescii(3tor 
151 hasElevation 
lOl hasislands 
I □ IhasLandTvoe 


Figure 39. Geography Properties 


OWL-DL ontologies allow the speeifieation of different types of objeet 
properties. They inelude inverse, transitive, symmetrie, funetional and inverse functional 
properties. Eaeh of these properties consists of its unique OWL-DL eonstructs, which is 
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discussed in detail in the seetions below. It is important for the developer to correetly 
identify the type of property and speeify it in the ontology. 

a. Inverse Properties 

Properties having an opposite relationship to one another are known as 
inverse properties. An inverse property is denoted using the OWL syntax, 
owl: inverseOf , a subproperty of owl: Ob jectProperty, to indicate a diametric 
relationship to the specified inverse property. Generally, a property denotes a one- 
direetion relationship from subjeetto object, such as IndividualA “isParentOf” 
IndividualB. Logically, IsParentOf property, by itself, reveals no information 
about whether there is a eorresponding relationship in the other direction. Developer can 
create another property, ealled isChildOf , to assert an opposite relationship from 
IndividualB to IndividualA, by designating the property as the inverse property 
of isParentOf. 

Consider the Geography example in Figure 40. 



Figure 40. Inverse Property 


The seleeted hasCountry property is an inverse property of hasCity, 
whieh is shown under the “Inverse” specifieation slot. If an individual has a 
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ha s Count ry property value filled by another individual, then by the rule of inverse 
property those two individuals also have an opposite relationship via hasCity property 
That is, if the individual Venice, an instance of the City class, has the hasCountry 
property value of Italy, an instance of Country, Italy will automatically have the 
hasCity property value of Venice as shown in Figure 41. 



Figure 41. Individual Attributes of Inverse Properties 


When using the inverse property, the domain and range axioms should be 
carefully considered. Although the inverse property example shown in Figure 8 has the 
domain and range defined, it is equally valid to leave them undeclared, which defaults to 
the highest class owl : Thing. In fact, if domain and ranges specification are not 
compatible between inverse properties, it may cause an error in the ontology and lead to 
unintended consequences. 

b. Transitive & Symmetric Properties 

Similar to the inverse property, transitive and symmetric properties are 
subclasses of owl: Ob jectProperty and they assert information about the 
relationship of the individuals related by these properties. A transitive property is 
commonly used to represent “part-whole” relationships. That is, if transitive property Px 
links individuals X and Y as well as individuals Y and Z, then it is inferred, by the rules of 
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transitivity, that Pj relates X to Z. Figure 42 shows how Protege defines transitive and 
symmetric properties. 
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Figure 42. Transitive & Symmetric Properties 


In the Geography ontology, the locatedln is a transitive property and it 
is applied the individuals Vat icanCity, Rome, and Italy. That is, if 
Vat icanCity is “locatedln” Rome and Rome is “locatedln Italy”, then by 
the rule of transitivity, VaticanCity is “locatedln” Italy. While this 
implication is not explicitly stated in OWL or visible in Protege, the inferred relationship 
is made transparent when the ontology is used to make reasoning decisions. Inference 
engines, such as RacerPro, read the OWL syntax and make the implied link as defined by 
the transitive property. 

A symmetric property, on the other hand, allows the individuals to have a 
reciprocal or a bi-directional relationship. Unlike the inverse properties, a symmetric 
property is one relationship that is applied in both directions as shown in Figure 43. 
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Figure 43. Differenee between Inverse and Symmetric Properties 

Figure 43 shows the difference between inverse and symmetric properties. 
The isParentOf property and isChildOf properties denote opposite relationships, 
making them inverse properties. However, the symmetric property isSiblingOf 
allows the relationship to be bi-directional, allowing the subject to be the object and vise 
versa. 

In the Geography ontology, an example of symmetric properties is 
ad jacentTo. When individual A is ad jacentTo individual B, then by rule of 
symmetry, individual B is ad jacentTo individual A. Specifically, if individual 
Mexico is ad jacentTo UnitedStates, then it is inferred that UnitedStates is 
adjacentTo Mexico. 

c. Functional & Inverse Functional Properties 

A functional property indicates that, for a given individual, there can be at 
most one property value associated with that individual. For a functional property Pp, 
individual X is associated with at most one property value of individual Y. However, if 
Pf links X with another value, say individual Z, then by the rule of functional property, 
individual Y must equal individual Z. In other words, they are the same object or value 
with two separate instantiations. 

Consider the example of Figure 44 from the Geography ontology. In this 
example, the property hasCapital is a functional property. 
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Since hasCapital is 

afunctional property, 
these two individuals 
must be the same. 


Figure 44. Attributes of a Functional Property 

The individual UnitedStates is associated with two different 
hasCapital values, namely DistrictOf Columbia and WashingtonDC. 
However, since functional property must only have one value associated with a given 
individual, it must be inferred that these are equal objects. 

Similar to inverse property, inverse functional property denotes an 
opposite relationship to its “inverse-of ’ property, which is a functional property. Since 
functional property is restricted to one property value, the same is applied to the inverse 
functional property. For an inverse functional property Pif, if individuals X relates to 
individual Z and individual Y also relates to Z, then it is assumed that individual X equals 
to individual Y. An example of the inverse functional property from the Geography 
ontology is belongsToCountry property as shown in Figure 45. 

belongsToCountry 

District of 
Columbia 

Washington 

D.C. 

belongsToCountry 

Figure 45. Attributes of an Inverse Functional Property 

Figure 45 shows the values DistrictOf Columbia and WashingtonDC are both 
associated to UnitedStates by belongsToCountry property. By the rules of 
inverse functionality, it is inferred that these are equal and they are two instantiation of 
the same value. 


The inverse functional 
belongsToCountry 

property implies that 
these two are the same. 
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Figure 46 shows how the functional and inverse functional properties are 
designated in Protege. 



Figure 46. Functional and Inverse Functional Properties 


Figure 46 shows that the definition of hasCapital, which is a 
functional property, specifies that it has an inverse functional property relationship to 
isCapitalOf. Once two properties are linked by the inverse-of construct, the 
specification only needs to be stated once; in this case the isCapitalOf property 
definition will automatically show that it has an inverse functional relationship with 
hasCapital. 

5, Describe Classes Using Property Restrictions and Complex 
Definitions 

Once properties are defined, they are used to restrict and describe classes. In 
order to associate a property with a class definition, it must be used as part of the class 
restriction. There are three types of class descriptions in OWL-DL, namely enumeration, 
property restriction, and complex class definition. First, enumeration describes a class by 
exhaustively listing all of its members or instances in its definition using the OWL 
construct owl: oneOf . No other members, other than those listed under the definition 
can belong to the class. Second, there are two types of property restrictions, quantifier 
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(or value) and cardinality. The quantifier restrietions constrain the range value of the 
property when applied to the class definition. The cardinality restrictions constrain the 
number of property values the class instance is allowed. And third, complex class 
descriptions are defined using logical operators of intersection (AND), union (OR) and 
complement (NOT) of classes. They represent advanced class logic of OWL-DL. Along 
with describing the restrictions in detail, there are two OWL-DL concepts, the difference 
between universal and existential restrictions and understanding “open world” vs. “closed 
world” assumption, that frame the types of restrictions used for class descriptions. These 
will be discussed in detail below. 

a. Universal and Existential Restrictions 
One of the most common errors when using property restrictions to 
describe classes is the differentiating between universal and existential restrictions. 
Without understanding the meaning and implications of these two restrictions, it is likely 
that many developers will use the wrong restriction. To constrain the range value of a 
property, an existential restriction (someValuesFrom) should be used rather than a 
universal restriction (allValuesFrom). The existential restriction, denoted with the 
symbol "3", states that the individuals of the class being defined must have at lease one 
property relationship with the specified range of individuals. In other words, if a property 
restriction for ClassX is "3 Propertys ClassY"', then every individual of ClassX 
have at least one PropertyE relationship with an individual of ClassY. By this 
definition, however, it is possible to for individuals of ClassX to have PropertyE 
relationship with individuals of other classes as long as it satisfies the “at least one” 
requirement. It does not restrict the individuals to have PropertyE relationship with 
only the individuals of ClassY. On the other hand, universal restriction, denoted with 
the symbol "V", states that individuals of the class being defined must have all of their 
property relationships with the specified range of individuals. For ClassX with a 
property restriction of" 1/ Propertyu ClassY"', if individuals of ClassX have any 
Propertyu relationship, it must be with individuals of ClassY. However, it is 
possible for individuals of ClassX to not have any Propertyu values. Unlike the 
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existential restriction, universal restriction does not require the individuals to have any 
property relationship with a defined set of objects. 

In description logic, existential restrictions are used to limit the property 
range, requiring every individual of that class to have at least one property value from the 
specified range. In defining the Country class, the property containsFeatures 
uses the existential restriction someValuesFrom as shown in Figure 479. 


OWL: 

Class (Country) Subclass of PoliticalGeography 

Restriction (containsFeatures someValuesFrom BodyOfLand) 

Translation: 

Country class contains, amongst other things, some form of BodyOfLand 


Figure 47. Existential Restriction in OWL 


The existential restriction, "3 containsFeature BodyOfLand", 
requires that at lease one of the Country individual have a containsFeature 
property value from the individuals of BodyOfLand. As long as that requirement is 
satisfied, individuals of the Country class may have containsFeature property 
value from individuals from other classes, as shown in Figure 48. 



Figure 48. Existential Restriction Example 


If the Country class was defined by a universal restriction, the 
allValuesFrom semantic is used to constrain the class description (Eigure 49). 


9 The “translation” is the English paraphrasing of the OWL-DL semantics stated in the examples. 
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OWL: 

Class (Country) Subclass of PoliticalGeography 

Restriction (containsFeatures allValuesFrom BodyOfLand) 

Translation: 

Country class contains, amongst other things, some form of BodyOfLand 


Figure 49. Universal Restriction in OWL 


The universal restriction, "V containsFeature BodyOfLand", requires 
that if an individual of Country has a containsFeature property value, it must be 
an individual of BodyOfLand. Flowever, this restriction does not require all of 
Country individuals to have a containsFeature property value. Unlike the 
existential restriction, individuals may not be associated with any containsFeature 
relationships. This is shown in figure 50. 



Figure 50. Universal Restriction Example 


Developers should be clear about the appropriate type of property 
restrictions that should be applied to the class definitions. If the wrong restriction is 
applied to the class, there will be unforeseen consequences when the ontology is 
inferenced and affect the overall validity of the ontology. 

b. Open World vs. Closed World 

Most ontology developers, unfamiliar with open world reasoning of OWL, 
fail to make negation explicit. Databases, logic programming and frame languages are 
“closed world reasoning" systems which assume that when something is not found, it is 
false. However, description logic based languages, such as OWL-DL, associate negation 
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with “unsatisfiability.” That is, falsification can only be proven if eontradieting 
information is made explieit. 

Consider the IslandCountry and LandlockedCountry elass 
definitions in Figure 51. 


OWL: 

Class (IslandCountry) Subclass of Country 
Restriction (hasLandType someValuesFrom Island) 

Restriction (hasBorder someValuesFrom Ocean) 

Translation: 

IslandCountry is any country that has, amongst other things, some land type of Island and some 
border of Ocean. 


OWL: 

Class (LandlockedCountry) Subclass of Country 
Restriction (hasBorder someValuesFrom Country) 
complementOf (restriction (hasBorder someValueFrom Ocean)) 

Translation: 

LandlockedCountry is any country that has, amongst other things, some border of Country and does 
not have some border of Ocean. 


Figure 51. Definitions of IslandCountry and LandloekedCountry 


Intuitively, developers will define the elasses IslandCountry and 
LandlockedCountry as stated in Figure 51. When a developer uses the existential 
restriction statement someValuesFrom, to restrict the property value, it leaves the 
definition open to other assumptions, as expressed by the translation '"amongst other 
things." That is, these definitions are open to interpretation that the individuals can have 
property values other than what was specified with the someValuesFrom restrictions. 
Although these definitions are not technically invalid and do not cause problems by 
themselves, complications occur when you introduce other classes, such as 
ArchipelagoCountry, to the ontology. Consider the definitions in Figure 52. 
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OWL: 

Class (Archipelago) Subclass of Island 
Restriction (cardinality >2 Island) 

Translation: 

Archipelago is any island that consists of, amongst other things, at least two islands. 


OWL: 

Class (ArchipelagoCountry) Subclass of Country 
Restriction (hasLandType someValuesFrom Archipelago) 

Translation: 

ArchipelagoCountry is any country that has, amongst other things, some land type of 
Archipelago. 


Figure 52. Definitions of Arehipelago and ArehipelagoCountry 


Based on the speeifieations of Archipelago and 
ArchipelagoCountry, it may seem appropriate that ArchipelagoCountry elass 
be subsumed under IslandCountry elass, since Archipelago is a subclass of 
Island. However, given the open definition of IslandCountry, as shown in Figure 
19, ArchipelagoCountry will not be inferred or classified as an 
IslandCountry. The current definition of ArchipelagoCountry, described by 
the existential property restriction of someValueFrom, does not preclude the class 
from having a land type of something other than an Archipelago. Thus, 
ArchipelagoCountry can take on any form of land type and be classified as a 
LandlockedCountry as likely as any other type of country in the ontology. 

Based on the definition shown in Figure 20, open world reasoning makes 
no assumptions about the land type of the ArchipelagoCountry class simply based 
on the fact that other land type information was absent from the definition. In order for 
this class to be classified as a subclass of IslandCountry, as is the intention of the 
developers, it must explicitly exclude of all other land types in the class definition. This 
is accomplished by including a further restriction known as a closure axiom. According 
to Horridge et ah, "[a] close axiom on a property consists of a universal restriction that 
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acts along the property to say it ean only be filled by the specified fillers [Horridge et. ah, 
2004, 70]. The restricton has a filled that is a union of the fillers that oecur in the 
existential restrietion for the property." That is, elosure axiom adds the universal 
restriction allValuesFrom to the existing existential restriction to exclude other 
possible assumptions in the elass definition. Henee, it eloses the elass definitions to other 
interpretations, as shown in Figure 53. 


OWL: 

Class (ArchipelagoCountry) Subclass of Country 
Restriction (hasLandType someValuesFrom Archipelago) 

Restriction (hasLandType allValuesFrom Archipelago) 

Translation: 

ArchipelagoCountry is any country that has, amongst other things, some land type of Archipelago and 
only land type of Archipelago. 


Figure 53. New Definition of ArchipelagoCountry 


By applying the closure axiom, the definition of 
ArchipelagoCountry is no longer open or ambiguous. As written under the 
translation, the allValuesFrom speeifieation adds the restriction “only” to the 
definition, disallowing the hasLandType property from ineluding any individuals other 
than those belonging to the Archipelago elass. The elosure axiom should be applied 
to all elasses where sueh quantifier property restrietions apply; otherwise, infereneing 
tools eannot properly elassify the classes. Henee, elasses IslandCountry and 
LandlockedCountry should also inelude elosure axioms as shown in Figure 54. 
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OWL: 

Class (IslandCountry) Subclass of Country 
Restriction (hasLandType someValuesFrom Island) 

Restriction (hasLandType allValuesFrom Island) 

Restriction (hasBorder someValuesFrom Ocean) 

Restriction (hasBorder allValuesFrom Ocean) 

Translation: 

IslandCountry is any country that has, amongst other things, some land type of Island and some 
border of Ocean and only land type of Island and only border of Ocean. 


OWL: 

Class (LandlockedCountry) Subclass of Country 
Restriction (hasBorder someValuesFrom Country) 

Restriction (hasBorder allValuesFrom Country) 

complementOf (restriction (hasBorder someValueFrom Ocean)) 

complementOf (restriction (hasBorder allValuesFrom Ocean)) 

Translation: 

LandlockedCountry is any country that has, amongst other things, some border of Country and 
does not have some border of Ocean and only has border of Country and never has border of Ocean. 


Figure 54. New Definitions of IslandCountry and LandlockedCountry 
c. Domain and Range 

Each property has an associated domain and range as part of the property 
definition. When the property is initially created, its domain and range defaults to OWL's 
highest level class, the owl: Thing class. However, the developer may change the 
domain and range to other values. As mentioned in the previous chapter, the domain of a 
property associates the property with the class(es) it modifies and asserts that the subjects 
of such property statements must belong to the instance of the class. The range is the 
specified set of values, either class(es) or data string, that the property is allowed to take 
as its value. For an object property, the property links the individuals of the domain class 
to the individuals of the range class. Unlike the quantifier restrictions, domain and range 
are global axioms that are applied wherever the property is used, rather than only at the 
class description level. 

If the Geography ontology's hasCapital has a domain of Country 
and range ofCapitalCity, then this property is intended to connect the individuals of 


74 







Country to the individuals of CapitalCity whenever the hasCapital property is 
used as a restriction. 

When specifying an object property, it is important to define the range 
type of “instance,” rather than “class,” which is the other possible option in OWL (Figure 
55). Developers commonly mistake the range values of a property to be the class objects 
themselves, rather than the individual(s) that belong to that class. By choosing “class” as 
the range type, OWL will treat the class as an individual creating a type of “meta¬ 
statement” allowed only in the OWL-Full sublanguage. 



Figure 55. Selecting the Property Range Type 

Range of an object property may consist of individuals from more than 
one class. If property? has a range of individuals from ClassA and ClassB, then the 
possible values of property? include all the individuals, or the union, of Class A and 
Class B. In the Geography ontology, the property has Jurisdiction is a property 
that li nk s the individuals of the Government class to the individuals of both City and 
Country classes, via the property. When multiple classes are designated as the range, 
the OWL represents the range values as the union of classes, in this case the individuals 
that are either individuals of City or individuals of Country. 
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In OWL, domain and range constraints are axioms lo used for reasoning, 
rather than binding restrictions of the property. Therefore, misusing the constraints can 
cause significant errors and create unintended consequences when the ontology is 
classified or inferenced. Consider the example the hasBoundary property. The 
property’s domain is BodyOfLand and range is the union of Latitude and 
Longitude classes. However, no error is raised when this property is used to describe 
the relationship between individuals of the class Country and the individuals of 
Ocean, even though Ocean is not part of the specified range. If Brazil, an individual 
of Country, applies the hasBoundary property to associate with At lant icOcean, 
an individual of Ocean, the OWL statement will reads ‘"Brazil hasBoundary 
At 1 anticOcean” Although this relationship does not cause an error by itself, the 
problem occurs when the ontology is classified. Since hasBoundary property has 
defined domain and range axioms, BodyOfLand and union of Latitude and 
Longitude respectively, and since that does not align with the property applied to 
individuals of Country and Ocean, the classifier will make inferences based on the 
domain and range specification. In this scenario, the classifier will infer that Country 
is a subclass of BodyOfLand and Ocean is a subclass of Latitude and 
Longitude. Furthermore, if Ocean is defined as disjoint from Latitude or 
Longitude in its class definition, then OWL will raise an error because disjointed 
classes cannot have a superclass-subclass relationship. This unintended result of class 
subsumption is an error that can be avoided if the developers fully understand the 
consequences of designating the property domain and range. For most developers, it is 
recommended that they do not specify the domain and range, reducing the chances of 
serious errors in the ontology. 

d. Primitive and Defined Classes 

Unlike other languages, OWL differentiates between “primitive” and 
“defined” classes. Primitive classes, also referred to as “partial classes,” are those 
defined only by necessary conditions or restrictions. Defined, or “complete” classes, 

10 Axioms are general statements or assumptions accepted as true without demonstrated proof. 
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have at least one necessary and sufficient eondition. The difference between a primitive 
and defined class is the level of completeness associated with the class definition. 
Reasoning tools can base their classification inferences only on defined or complete 
classes; no definitive conclusions can be made on primitive classes. 

In the Geography ontology, CoastalCountry is a defined class 
because it contains necessary and sufficient conditions as part of the class specification as 
shown in Figure 56. 



Figure 56. Defined Class Example 

The necessary and sufficient conditions of the CoastalCountry class 
imply that any class that is country and has a land type of Coastline, amongst other 
things, is, by definition, a CoastalCountry. If this class was defined as primitive, 
with necessary conditions only, such unambiguous inference cannot be made. The 
primitive restrictions are insufficient to infer that the satisfaction of these conditions 
implies that it is the named class. It is crucial for developers to understand that unless 
classes are complete, using necessary and sufficient conditions, the classifier does not 
attempt to inference class subsumption. While the primitive class can only define its 
conditions, defined class are also defined by them. This difference is shown in Figure 57. 
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Figure 57. Necessary vs. Necessary & Sufficient Conditions 


The OWL construct used to indicate the defined class condition is 
owl: equivalentClass. As mentioned in the previous chapter, this construct states 
that the class being defined has the same description, or list of individual members, as the 
conditions specified under the owl: equivalentClass tag. The necessary and 
sufficient definition of CoastalCountry using owl: equivalentClass is shown 
in Figure 58. 


<owl:Class rdf:ID="CoastalCountry"> 

<owl:equivalentClass> 

<owl:Class> 

<owl:intersectionOf rdf:parseType="Collection"> 

<owl:Restriction> 

<owl:onProperty> 

<owl:ObjectProperty rdf:about="#hasLandType"/> 
</owl:onProperty> 

<owl:allvaluesFrom rdf:resource="#Coastline"/> 
</owl:Restriction> 

<owl:Restriction> 

<owl:onProperty> 

<owl:ObjectProperty rdf:about="#hasLandType"/> 
</owl:onProperty> 

<owl:someValuesFrom rdf:resource="#Coastline"/> 
</owl:Restriction> 

<owl:Class rdf:about="#Country"/> 

</owl:intersectionOf> 

</owl:Class> 

</owl:equivalentClass> 

</owl:Class> 


Figure 58. Definition of CoastalCountry in OWL 
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Classifiers cannot make assumptions about class subsumptions without the 
“necessary and sufficient” descriptions that makes classes complete. OWL ontology 
experts argue that developers do not sufficiently understand the importance of defined 
classes and wherever applicable, many classes should have a complete definition 
[Horridge, 2004, 57]. 


e. Complex Classes: Proper use of Logical Operators “AND” & 
“OR” 

Complex classes are built from joining simpler classes using logical 
operators such as “AND” (n) and “OR” (u). Complex classes are named classes, but 
with their restrictions stated under an anonymous class declaration. A class created using 
the AND (n) operator is an intersection class. An intersection class combines two or 
more classes, using an anonymous class description that restricts the individuals to the 
members of the intersection of these classes. A complex class created using the OR (u) 
operator is a union class. While the intersection class is made up of only the member 
belonging to all classes specified, a union class encompasses all the members of the 
classes included in the union, as diagramed in Figure 59. 


Intersection Class 

Country City 



Union Class 


Men 


Women 


' \ t f'l 0%V', 

00 / 


People 


Figure 59. Intersection Class vj'. Union Class 

The application of intersection and union operators are commonly misused 
because the logical conjunction and disjunction do not intuitively correspond with the 
linguistic use of “and” and “or.” The English statements such as “Name all the political 
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geography entities that are city and country,” does not make it distinctly clear whether it 
refers to entities that are simultaneously city and country, or both entities that are city and 
entities that are country. In its logical use, AND refers to the intersection, which make up 
the subsection, including members that belong to every intersected class. 

The Geography ontology example of an intersection class isCityState 
(CitynCountry). This class is defined in a way that the members must belong to 
both the City and Country classes; they are simultaneously individuals of City and 
individuals of Country. 

A union class, with the logical use of OR operator, contains all the 
members of each class included in the union. In the Geography ontology, the People 
class is a union class, (MennWomen). Members of the People class include all the 
individuals of Men and all the individuals of Women. 

6. Classify Ontology with a Reasoning Tool 

One of the main advantages for developing an OWL-DL ontology is its 
compatibility with classification or inferencing tools. These tools validate and find new 
classifications of the class hierarchy based on the class descriptions. The inferred 
classifications provide the developers with error-checking as well as recommendations on 
how the classes should be organized. This is tremendously valuable, especially with a 
large and complex ontology, because it allows the developers to verify the consistency of 
the class descriptions with the overall schema of the ontology. It is recommended that 
after every iteration of class descriptions, the developer should invoke the classifier to 
check the validity of the definitions. Classification tools, such as RacerPro, output any 
errors or inconsistencies they find in the ontology. 

Classifiers are also important for identifying multiple class inheritances. Multiple 
inheritance occurs when a class belongs to more than one superclass. When this happens, 
the inheriting class takes on the characteristics of all of its parent classes. Based on 
necessary and sufficient conditions of the classes, RacerPro finds classes that should be 
subsumed under more than one class. Although it is possible for the developers to 


designate multiple inheritance classes manually, it is recommended that the they create a 
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simple class hierarchy and let the classifier infer the multiple inheritances based on the 
class descriptions. It is argued that this method allows for a more manageable and 
modular ontology, which minimizes errors and maximized reuse of the ontology 
[Horridge et. ah, 2004, 69]. 

Using Protege as the ontology editor, the Geography ontology can be classified 
using RacerPro as the backend reasoning engine. Figure 60 shows the developer’s 
ontology before classification. 



Figure 60. Ontology Before Racer Classification 


The ontology, shown in Figure 60, consists of primitive and defined classes, 
differentiated by the yellow and orange icon colors respectively. The goal of classifying 
the ontology, based on the conditions and restrictions of the user-defined classes, is to 
find inferred relationship. This is especially important as an ontology grows in size and 
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complexity. However, even when an ontology is relatively simple, RaeerPro finds and 
cheeks the common classification errors made by developers. 


When the Geography ontology is classified, RaeerPro finds the errors as shown in 
Figure 61. 


BGeography Protege 3.0 (file:\C:\Documents%20and%20Settings\Ann%20Lee\My%20Documents\Ann's%20Directory\Thesis\Protege\0WL%20... 


Re Ecin Project Wizards Code l^ndow Help 

f ^@CW1.0asses [|jPlllPropat1ies ^ — Forms |f Individuals ||^^Metadala j 



Reasoner warnirrgs 1, Oassification Resuls f 


Figure 61. Ontology After Racer Classification 


According to RaeerPro’s infereneing engine, the user-defined Geography 
ontology, represented under “Asserted Hierarehy,” has three elassifieation errors. First, 
RaeerPro infers that CitySt ate, which was defined above as an interseetion of two 
classes, CitynCountry, has multiple inheritanees; CityState is a subelass of both 
City and Country. RaeerPro infers that the since the complex class description, the 
intersection of two classes, is defined as a necessary and sufficient condition, is should be 
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subsumed under City and Country classes separately. RacerPro’s second and third 
inference results state that the Men and Women classes should be subsumed under, rather 
than being siblings of, the People class. Like the CityState example, the class 
description of People is complete or defined, implying that the satisfaction of the class 
conditions infers equivalence with the class itself Since the People class is defined as 
a union class, MennWomen, all the individuals of these two classes also belong to the 
People class. 

7. Create Individuals and Fill Property Values 

OWL instantiates the ontology classes by creating individuals. Individuals 
represent the actual real-world entities of the interested domain that the ontology is 
attempting to categorize and link by property relationships. Furthermore, as shown 
throughout this chapter, individuals are used as part of class description and restrictions. 
As stated in Chapter Two, there are specific OWL constructs used to denote semantics of 
individuals, such as owl: hasValue, owl: sameAs, and owl: dlf f erentFrom. 
Likewise, individuals are used to define enumerated classes using owlioneOf. 

Many individuals that are included in an ontology are determined early in the 
development process, when the domain concepts are informally listed in Step Two of the 
development methodology. The concepts that were at the lowest level of specification, or 
cannot be grouped as a class, become the individuals. Unlike the other entities of an 
ontology, such as classes and properties, individuals are the actualization or instantiations 
of the descriptions. In the Geography ontology, some of the concepts appropriate as 
individuals are Italy, France, Mexico, Rome, VatlcanClty, Paclf IcOcean, 
GangesRlver, MtVesuvlus and LakeOntarlo. 
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Figure 62. Example of the Individual Florenee 


Figure 62 shows the list individuals of the City elass. For eaeh individual, there 
is an assoeiated list of properties speeified in the elass definition. These property values 
are determined within the individual deseription, as shown in the Protege editor window. 
Sinee properties denote relationships between individuals, or between an individual and a 
datatype string, the developer inputs these values at the individual instantiation stage of 
the developments. For example, the individuals of the City elass have the 
containsPhysicalGeography,adjacentTo, locatedin, and 
hasPopulat ionCount property values to be filled as part of the individual 
instantiation. The instanee Florence fills those properties slots with the appropriate 
values as speeified in Figure 62. 

Although this step is the least diffieult step of development stages, it is the most 
time eonsuming. Depending on the domain and scope, the number of individuals can 
grow tremendously large. However, as long as the schema of the ontology is fully 
developed and structurally valid, managing the individuals should not pose a challenge. 


E. OTHER CONSIDERATIONS FOR ONTOLOGY DEVELOPMENT 

OWL allows an ontology to import other ontologies. In the famous Wine 
ontology, the creators import the Food ontology to describe and pair the various types of 
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wine with food. By importing the Food ontology, the developers are able to make use of 
all the Food elasses, properties, individuals, and axioms as part of the Wine ontology and 
as part of the Wine class descriptions. Developers can also extend the imported ontology 
by adding further description of the Food classes. It is important to understand the 
difference between ontology reference and importing. References to other ontologies are 
commonly made using namespaces, such as rdf and rdf s, but the references do not 
allow the user to manipulate the objects of these ontologies. Importing allows the 
developer to have access to all of the axioms and objects of the ontology that are 
unavailable by reference. Furthermore, OWL imports are cyclic in that Wine ontology 
can import the Food ontology and the Food ontology can import the Wine ontology. 

Ontology imports are integrated to the existing ontology using namespaces similar 
to references. The namespace is associated with the URL, where the ontology is located. 
The Geography ontology imports the countries . owl ontology, which lists the ISO 
3166 country codes, available through Protege ontology library. 11 This URL for this 
ontology is http://www.bpiresearch.eom/BPMO/2004/03/03/cdl/Countries, which serves 
as the default namespace (Figure 63). 



Figure 63. Importing Ontologies with Protege 


11 Contributed by Dieter E. Jenz, Jenz & Partner GmbH, http://www.jenzundpartner.de/index.html. 
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The namespace URL has a # sign attached to it as a separation marker between 
the URL and the entities within the ontology. The namespace, which uniquely identifies 
the imported ontology, is associated with every entity of the imported ontology. 

However, rather than including the long URL with every class, property and individual, a 
prefix is used in its place. In this case, the prefix "country" will be used as the 
namespace. Once the Country ontology is imported, all the entities are “visible” to the 
geography ontology, as shown in Figure 64. 
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Figure 64. Classes and Properties from Imported Ontology 


Figure 64 shows the imported ontology has a prefix associated with every class 
and property. Now these objects can seamlessly be integrated into the Geography 
ontology as class descriptions and even take on extensions unique to this ontology. 


F. CONCLUSION 

This chapter provided a methodology for developing an OWL-DL ontology. The 

recommended seven steps approach, described above, should be used as a guide for 

SMEs to build an ontology in their areas of expertise. As emphasized earlier, since the 

scope and application of the ontology determines the content and structure of the 

ontology, a significant effort should be spent on understanding the goal of the ontology, 

as described in step one. Once that is defined, informally listing the relevant terms of the 
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domain, step two, is the best method to finding the appropriate elass objects of the 
ontology. Step three organizes those concepts into the class hierarchy. It is important at 
this stage to understand the difference between classes and individuals as well as the 
relationship of subclasses. Next, define the properties of the domain, as described in step 
four. OWL offers multiple property constructs. Understanding the semantics of these 
types of property and the kinds of relationship they imply are important in creating a rich 
ontology. Step five involves using these properties and other constructs to restrict and 
describe classes. Developers are advised to avoid common errors by using existential 
restriction as the default and using closure axioms to further limit the class definition. 
Likewise, it is important to remember that OWL is a language with open world 
reasoning. All necessary description should be stated explicitly. This step also explains 
the difference between primitive and defined classes. Classes are categorized as defined 
only when they use necessary and sufficient conditions as part of their descriptions. 

Once the classes completely defined, classification engine is used to inference the 
ontology, as stated in step six. It is at this stage where ontology validity and consistency 
are checked using an inferencing tool such as RacerPro. And finally, step seven 
describes how individuals are instantiated. They represent the real-world entities of the 
ontology's domain, rather than their abstractions. 

Although these seven steps were described linearly, the development process is 
iterative, as in the spiral model. It is likely, and even recommended, that the developer 
move forwards and backwards through the steps as necessary to improve and modify the 
ontology. And like other systems, success of an ontology depends on good management 
and maintenance. The structure of an OWL ontology makes it suitable for maintenance 
and updates. 

Given the difficulty of modeling real-world domain and knowledge into abstract 
ontological model, developing any ontology is a challenge. However, with a thorough 
understanding of OWL semantics and detailed planning of the development process, 
SMEs and others developers can build a useful knowledge representation system. 
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IV. ONTOLOGIES AS KNOWLEDGE BASES 


A. INTRODUCTION 

We define an effeetive seareh as one whieh returns to the user a set of highly 
relevant results. Using the appropriate keyword(s) is essential for suecessful researeh 
when performing a eonventional internet seareh. Very broad keywords will result in a 
large number of hits, many of them useless. In order to take advantage of eurrently 
available seareh engines to return topie-speeifie results, a highly relevant list of keywords 
and phrases needs to be formed. To develop sueh a list, users need to have aeeess to 
knowledge of the domain of eontext. An ontology, whieh is model of a domain of 
eontext, ean support the identifieation of preeise and relevant keywords. 

The Ontology-Aided Knowledge Diseovery Assistant (OAKDA), pronouneed 
"Oak D-A," developed as part of this thesis is an applieation that attempts to assist users 
to improve their Web searches by providing domain context to the search word or phrase. 
By navigating the ontology, users are assisted in finding a relevant set of key terms that 
will aid the search engines in narrowing, widening, or refocusing a Web search. The aim 
is to enhance the relevance and precision of the returned results through the use of a 
context provided by ontologies associated with each search. Additionally, in the process 
of mining the ontology, the users can discover knowledge about the concept of interest 
and other related terms in the domain. This chapter focuses on the purpose and 
motivation for the OAKDA and how ontologies can be used to augment the tools used to 
manage the vast amount of information and resources available in the Web. 

B, MOTIVATION FOR USING ONTOLOGIES 

An ontology can be defined as a formal explicit description of concepts in a 
domain of discourse, properties of each concept describing various features, attributes of 
the concept, and restrictions on these properties that are specified by semantics, or rules, 
that follows the “rules” of the domain of knowledge (Ushhold et ah, 1996, X). For these 
reasons, ontologies may have use as knowledge bases (KB) for an application attempting 
to add context to a particular search word or phrase. By navigating the ontologies, users 
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can understand the eontext of a partieular eoneept as well as the relationships it has with 
other eoneepts. Interest in the use of ontologies as knowledge bases is growing rapidly. 

If the availability of these ontologies inereases, their importanee may beeome more 
signifieant. Sinee no one ontology ean eover all aspeets of a domain and no two 
ontologies of the same domain will be the identieal, users reeeive greater value by 
aeeessing as many ontologies as possible. Similarly, ontology developers ean benefit 
from eaeh other sinee ontologies ean be built on top of others, expanding the breath and 
depth of any domain. Onee developed, ontologies ean be widely distributed and shared 
and used by both people and applieation systems. This seenario is what makes ontologies 
potentially very valuable. 

The fundamental purpose of an ontology is to improve the ability of humans and 
maehines to make judgments about data. Humans obtain and proeess information by 
reading the written word, whether on paper or on a eomputer sereen. Our understanding 
of how humans deeipher and proeess written language is not well understood and 
eonsequently, we've not managed to endow our maehines with the same eapabilities. 
Therefore, when maehines require information, it must be in a language and strueture that 
ean be understood by them. This is where ontologies are valuable by providing a method 
of translation. Undoubtedly, there is something lost in the translation for humans. 
Ontologies require people a good visualization strategy to be more easily aeeessible. For 
human beings, ontologies are more limited in terms of their deseriptive power as 
eompared with prose but ean be useful in eertain situations to provide a sueeinet 
overview and hierarehy of a domain. 

Regardless of the growing variety of applieations using ontologies, the benefits of 
using a knowledge representation system in a form of an ontology are reusability, 
interoperability, reliability, maintenanee, and knowledge aequisition. Although the 
method of making this eommunieation differs with applieations, an ontology allows both 
humans and maehines to have a method of eommunieating a domain knowledge in a 
eonsistent manner. 
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According to Jasper et al., there are four broad categories of ontology 
applieations, namely neutral authoring, ontology as specifieation, eommon aeeess to 
information, and ontology-based search. (Jasper et ah, 1999, 6) 

In “neutral authoring”, information is doeumented in a single language, whieh is 
then eonverted into different forms for reuse in various target systems. The main 
motivations for using this type of ontologies are the eost benefits of reuse and the 
portability of knowledge aeross multiple applieations. In order to support neutral 
authoring, supporting teehnologies sueh as a unidireetional ontology translator is 
required. 

“Ontology as speeifieation” applieations use a single ontology of a particular 
domain as the knowledge speeifieation basis for developing a partieular type of software. 
Sinee the software relies on the ontology to provide speeifie information of a domain, the 
ontology requires rieh semanties with as little ambiguity as possible. Unlike the neutral 
authoring approach, this application does not translate the ontology so mueh as it guides 
the target software development. Benefits of sueh systems inelude doeumentation, 
maintenanee, and reliability of the domain knowledge. 

Ontologies used as “common access to information” translate information into 
multiple formats. Using a mapping teehnique, the ontology renders sharing information 
between different platforms intelligible by using a shared understanding set of terms. 

The ontology provides a way of interoperability and knowledge reuse of disparate 
systems. Supporting teehnologies inelude translators and parser generators. 

Finally, “ontology-based seareh” applieations are used to seareh information 
repositories for relevant resourees. The motivation of using an ontology to assist the 
seareh is to retrieve a more preeise result. These applieations require teehnologies sueh 
as ontology browsers, seareh engine and infereneing tool. 

The OAKDA falls into the last eategory of ontology applieation. The goal of this 
applieation is to assist users in obtaining better seareh results by exploring the knowledge 
eontained in ontologies. Rather than relying on brute Web seareh engines approaehes, 
using OAKDA to “explore” a relevant domain and all the related eoncepts, relationships 
and properties of a seareh term would lead to a more effeetive list of results. Sinee the 
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tool has the capability to traverse the pertinent ontology that relate to the user’s area of 
interest and graphically view all the relevant concepts and their relationships, it allows 
the user to explore and better understand the domain of interest. This is a different 
method of retrieving and discovering information than a simple brute Web search. 

C. KNOWLEDGE DISCOVERY USING ONTOLOGIES 

It is best to illustrate the benefits of using ontologies to discover knowledge by 
using OAKDA with a number of example ontologies. The examples presented in the 
following three sections show how mining ontologies assist in domain knowledge 
discovery and the search for the right resources on the Web in the domains of wine, 
cartoon and geography. 

1. The Wine Domain 

Since the proliferation of the web pages on any topic imaginable, when someone 
wants to find information on a given subject the internet is now the first place to search. 
Whether it is for a profession, academic or personal purpose, the phrase “Google h” has 
become the ubiquitous solution. However, depending on the topic, “Googling” can 
actually provide more questions than answers. For example, if a user wants to search the 
Web on the topic of wine, there is no limit to the kind and number of resources produced 
by brute force search engines like Yahoo! and Google. For instance, a search of the word 
“Bordeaux” in Google results in 16.4 million hits, including sites for tourism to the 
Bordeaux region, the Universite Bordeaux, as well as Web sites selling Bordeaux wine, 
as shown in Figure 65. 
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Figure 65. Google Search Results 


These resources, while numerous, are not useful to someone with limited 
knowledge about wines in general and Bordeaux wines in particular. In fact, the 
information overload may creates more problems than solutions. When a user is 
unfamiliar with the search term, he or she greatly benefits from understanding the domain 
knowledge of the concept of interest. In other words, the user should learn the context in 
which the search term or topic belongs. 

One way of gathering information is to read a book in that subject or even peruse 
through all the various Web sites that the search term retrieves. The first method may be 
the most effective way of obtaining thorough knowledge on a topic, especially if the 
subject involves complex ideas or relationships. The second method of perusing all the 
various Web sites may be helpful and by process of elimination one can deduce the 
appropriate context of their search. However, without prior knowledge of the domain 
context, blind search can lead to misinformation. In both cases, the knowledge discovery 
can be time consuming. The third method is the use of an ontology. In this case, wine 
domain would be graphically represented in terms of classes, instances, and property 
relationships. 
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Although humans are trained to glean information primarily from text, graphieal 
representation ean aid understanding. By representing ontology graphically, individuals 
may be able to “learn” certain important facts about a knowledge domain more efficiently 
than reading text off of a page. 

In the case of the user searching on the topic Bordeaux, the user should first learn 
that it is a type of wine that is a special variety due to the location of its origin. This 
information is available if the user has a method of navigating a wine ontology to find 
how Bordeaux wine relates to other types of wine and what characteristics of Bordeaux 
distinguishes it from different wines, as well as other relevant concepts and relationships 
of the wine domain. 

There are various methods to “read” or mine an ontology. It can be displayed in 
the OWL syntax or other ontology languages. Otherwise, for easier visualization, 
ontologies can be viewed using an editor such as Protege. Figure 66 shows the Bordeaux 
as a class in the wine ontology in the Protege ontology editor. 



Figure 66. Protege View of the Wine OWL Ontology 

Using Protege, the user is able to view all the classes, properties and individuals 
of the ontology. The asserted ontology, shown in the leftmost window of Figure 66, is 
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classified using the elassifier RaeerPro to produee the inferred ontology in the seeond 
window. Based on the asserted deseription, the inferred ontology shows that the 
Bordeaux has two major subclasses, namely Red Bordeaux and White Bordeaux. Within 
these two elasses of Bordeaux wine, there are other subelasses, each distinguished by 
their unique eharaeteristies, sueh as region and type of grape used, while inheriting all the 
traits of the parent elasses. Red Bordeaux or White Bordeaux. 

Protege, like other ontology editors, makes it easy to read and navigate an 
ontology. However, in order to mine an ontology using an editor, the user must know 
exaetly what ontology he or she needs as well as have aeeess to them to load into the 
applieation. If users have no knowledge about their domains of interest, it is unlikely that 
they will have aeeess to the appropriate ontologies. For those users. Protege is not useful 
as an ontology-based seareh applieation. Protege is an applieation more appropriate for 
ontologies developers rather than the users. 

The OAKDA proposes to fill the gap between available ontologies representing 
various domain knowledge and the resourees on the Web. It is an applieation that allows 
users to seareh its database of ontologies for their seareh term of interest and find the 
appropriate ontology that fits the appropriate domain. When users speeify a search term 
of interest, OAKDA allows them choose from various ontologies and navigate along the 
most relevant ontology tree to diseover related eoneepts and relationships that were 
previously unknown to them. Using the example above, one ean seareh the OAKDA 
database of ontologies for the term “Bordeaux” as shown in Figure 67. 
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Figure 67. OAKDA Search Screen 


Once the search term is submitted and OAKDA finds a match in the database, it returns a 
list of hits, as in Figure 68. 
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Figure 68. List of Knowledge Base Search Results 

As Figure 68 shows, all the related terms are extracted from the wine OWL 

ontology. The “SCORE” of the search results is the ranking of match “closeness” and the 

“TYPE” refers to the element’s OWL ontology object types, such as class, property or 

individual. Once the user reviews the results, he/she may begin to have a better idea 

about what additional information is relevant to their original search. In this case, the 
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user may be interested in learning more about Red Bordeaux. If the user clieks on that 
term, OAKDA generates a graphical representation of the Red Bordeaux class and all the 
objects related to this main concept of interest. Figure 69 shows the Red Bordeaux object 
as the middle node and all the related objects, in this case super and sub classes of Red 
Bordeaux Class. The direction of arrows from the Bordeaux class as well as the color of 
the node identifies the type of class. 



Figure 69. OAKDA View of Red Bordeaux Class 

From Figure 69, the user discovers that Medoc is a subclass of Red Bordeaux for 
which the user would like to obtain additional information. The user can reorient the 
graph, using the "Orient about Node" command, which will show the Medoc class as the 
new central object of the graph. Figure 70 shows the result of the new orientation. Using 
this tool, one can easily navigate along the tree of the ontology hierarchy and view the 
classes, properties and individuals related to the search term of interest. 
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Figure 70. OAKDA View of Medoc Class 

After the graph is reoriented around the Medoc class, the user finds that there are 
two types of Medoc Red Bordeaux wines, known as Margaux and Pauillac, which are 
subclasses of the Medoc class. It also shows that the Medoc class inherits its properties 
from three parent-classes. Dry Red Wine, Red Table Wine and Red Bordeaux. That is, 
Medoc is a type of Red Bordeaux as well as a red table wine and a red dry wine. Hence, 
the user acquires knowledge about the domain he or she is interested in by traversing the 
ontology that graphically displays all the immediately related concepts and relationships 
of the search term. 

Furthermore, the OAKDA application automatically inferences any ontology 
loaded into the system, hiding the detail between asserted and inferred relationships 
between classes, as displayed in the Protege ontology editor in Figure 66. It is the 
inferred ontology, classified through RacerPro that identified all the multiple inheritances 
of the Medoc class. When the user searches the ontologies using OAKDA, the 
inferencing automatically occurs behind the scene and the user is able to view a valid and 
accurately classified ontology. 

In this simple example, OAKDA shows how it can assist users to search for 
additional information around the initial search term without the user having an accurate 
understanding of the domain or context of the term. All the concepts in this example 
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were in the elass level of the ontology. However, users may also use OAKDA to find 
instances and attributes of the search term. Examples of discovering these concepts and 
relationships in OAKDA are explained in next two sections. 

2. The Cartoon Domain 

When applicable, OAKDA also displays individuals and properties, along with 
class concepts. For example, suppose a user is interested in finding resources on the 
search term "Millicent." This concept does not provide enough information to retrieve 
meaningful search results without additional context. To add context, also suppose that 
the user knew that Millicent is a cartoon character. However, even with this additional 
contextual information, performing a brute search on the terms "Millicent and Cartoon" 
lists results that does not provide user with a consistent set of contextual or domain 
knowledge. Instead, it would be helpful if the user can navigate the cartoon ontology to 
understanding exactly where Millicent fits in the world of cartoon characters. 

Once the user searches the term in OAKDA, it will list all the relevant ontologies 
that have Millicent as a match. When the search returns the results of the match, 

Millicent is found as an individual in the cartoon ontology. Figure 71 shows Millicent, as 
the center node and highlighted in yellow, as an instance of the Mickey Mouse class. It 
also shows that it has property relationships with other individuals. For instance, 
Millicent has an “is niece of’ relationship with Minnie, which in turn has an inverse 
property, “has niece” relationship, with Millicent. As mentioned in the previous two 
chapters, class properties denote relationships between individuals. Therefore, properties 
are not visible to the users unless there are instantiated individuals in the ontology. 
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Figure 71. OAKDA View of Millieent Individual 


Similar to the wine ontology example above, the best way to navigate an ontology 
or explore the domain using the OAKA applieation is to reorient the graph around 
different objeets of the ontology graph. After understanding all the relationships around 
Millieent, the user may want to diseover more information about Miekey. This 
eompletely reorients the ontology graph with the new objeet, Miekey, as the eenter of the 
graph, as shown in Figure 72. 
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Figure 72. OAKDA View of Mickey Individual 


In Figure 72, the Mickey individual is now the central point of the graph and all 
the immediate concepts and relationships are depicted around the new object. The new 
graph now shows a different section of the ontology revealing other concepts that were 
not included as part of the Millicent graph. When the OAKDA graph is centered on an 
individual, as in Millicent or Mickey, the user sees all the different relationships that the 
key concept has with other objects or values. Again, the user not only discovers 
information about the original search term, the user also gains knowledge on the 
tangential concepts of the domain. 

Once the user obtains sufficient knowledge about the domain and the relevant 
contextual terms, he or she will have a better understanding of the types of information or 
resources to search for on the Web. OAKDA makes it easy for the user to design a list of 
search terms based on the graphical ontology representation. When the user finds a term 
that belongs as part of the search string, he or she can simply uses a right-click drop¬ 
down menu of the mouse to add it to a list. Figure 73 shows how a user adds the name of 
an object to the search list, using the "Add as Search Term" command. 
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Figure 73. Adding Terms to the Search List 

The user may add as many terms as desired; however, just as having too few 
selections can lead to an overload of Web page “hits,” too many search terms may 
eliminate many relevant resources. It is recommended that the user choose different 
combinations of relevant terms to compare the results of the matched resources. 

Once all the search terms are added to the OAKDA list, the user can manually 
edit them as appropriate. In the case of Millicent, the user has learned by using OAKDA 
that it is an instance of Mickey Mouse cartoon character. Therefore, the list of search 
terms include “Millicent” and “Mickey Mouse.” In Figure 74, the user can combine 
these two terms using different logical operators to perform the Web search. These 
operators have the same restrictions and conditions as those available in most search 
engines. The default, likewise, is the “AND” operator as is in other search tools. 
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Figure 74. OAKDA Web Search Parameter List 


Although the list in Figure 74 is relatively simple, it adds a context to the term 
Millicent that the user did not know before using OAKDA. Rather than just searching on 
“Millicenf ’ or “Millicent and cartoon,” the above list of terms provides the relevant 
context to the user’s Web search. Once the list of search terms in complete, the user 
submits the query to OAKDA. Figure 75 shows the results of this query. 



Figure 75. Web Search Results 
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The list of results shown in Figure 75 is derived using the same brute force search 
algorithm used by most search engines such as Google and Yahoo! However, by 
tailoring the list of search terms after discovering relevant domain or contextual 
knowledge of the user’s initial query, the Web search matches will be significantly 
different from the original set. The contribution that OAKDA makes is not to change the 
way Web searches are performed, but to assist users to get the most appropriate 
information by providing a method of obtain additional knoweldge about their term of 
interest. 

3, The Geography Domain 

Even in a domain that a user may be familiar with, it is possible to discover new 
relationships by mining the domain ontology. If an ontology is developed as a domain 
knowledge representation (KR) for general application purposes, it can be used as a 
reference to easily learn new information. The geography domain falls under this 
category. As detailed in Chapter 3, this ontology was developed for the purpose of 
demonstrating the ontology development methodology, and also to be used as one of the 
ontologies in the OAKDA knowledge base. In this section, an example of how to use the 
geography ontology for knowledge discovery will be shown using OAKDA. 

Suppose a user is deciding to go on vacation and want to find a place that meets a 
certain set of criteria, such as physical geography features and proximity to other 
locations. Of course, he or she can go to travel Web resources to discover different 
destination options. However, obtaining a better understanding of the world geography 
domain can help narrow down the traveler’s options. For instance, a particular traveler is 
a nature-lover who is interested in finding a location that is near a lake, river, and 
mountain, and possibly all this one a small island. This traveler is also interested in 
seeing different types of physical geography in a relatively small area or a place where he 
can walk through a rainforest as well as a desert. In order to discover such a location, the 
traveler needs to find information on the different geographies of potential destinations. 

Using the OAKDA application, the user decides to search on the term "island 
country" as a start. A list of ontology matches is presented and the user selects the most 
relevant concept from the geography ontology, having the highest score ranking. 
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OAKDA graphs the Island Country class, as a center node, and its related concepts as 
shown in Figure 76. 



Figure 76. Island Country from the Geography Ontology 


The graph shows that there are several individuals, shown as oval nodes with the 
arrow point towards the center, of the Island Country class. This implies that these are 
real-world instantiations of the island country class, and shown in the results the user 
confirms that these individuals are island countries. The user can learn more about any 
one of these individuals by reorienting the graph around that node. In this case, the 
traveler wants to find more information on Madagascar and reorients the graph around 
that individual node. Figure 77 shows the resulting graph. In general, an ontology's 
richness is represented at the instantiation level. While the ontology graph at the class 
level merely shows the taxonomy of classes and their instances, orienting the graph 
around an individual node provides information on its class as well as all the property 
relationships it has with other individuals. In other words, the information presented at 
this level displays all the semantic relationships the individuals have with others in the 
domain. Reorienting the ontology graph around Madagascar individual, the user now can 
see all of its relationship to its class as well as the types of relationships or links, shown 


105 

































as a circle node between two individuals, it has with all the other individuals in the 
ontology. 



Figure 77. Madagasear Individual Centered View 

Aeeording to the geography ontology, shown in Figure 77, Madagasear is an 
island eountry located in the eontinent of Afriea and is surrounded by the Indian Ocean. 

It has a Tropieal Humid climate and it contains such geographical features as 
Maromokofro Mountains, Anjafy Plateau, Mahajamba River and Lake Alaotra. 
Madagascar also has a rainforest, named Masoala. All this information ean be gleaned 
from the OAKDA graph representation of the Madagasear individual. Furthermore, the 
traveler also finds out information he was not speeifioally searehing for but may find 
interesting or useful, sueh as the languages spoken in Madagasear ineludes Freneh, and 
that it is governed by a republie government. In this example, the user finds a mateh to 
his seareh for an ideal location for a vacation by mining the geography ontology, and ean 
narrow his Web search for a travel ageney or resouree aecordingly. 
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In this example, the user was not neeessarily looking for an appropriate eontext or 
domain of a seareh term, rather the user used the applieation to find a mateh to a set of 
eriteria. That is, starting from a high level eoneept, island eountry, OAKDA helped the 
user drill down the ontology to the representation of real-world entities, individuals, 
where the user ean learn all the attributes or properties they have with other entities. For 
instanee, by mining the individuals of the island elass, the traveler narrows down his 
options to those that meet his eriteria. Of eourse, the user eould have begun his seareh 
with a different term, sueh as rainforest or tropieal humid climate and come to the same 
results. No matter which node of the ontology the search or mining starts, OAKDA 
allows the users to traverse up and down the tree to find the related information. 

However, it is only at the lowest level of detail, or instantiation of classes, that one is able 
to see the rich semantic relationships of individuals and where find detailed and complex 
knowledge about the concept of interest and the domain overall. 

Using OAKDA, users can focus on a narrow area of interest from relatively a 
large domain, such as a geography domain. Rather than search through volumes of 
mostly irrelevant web pages to find the right combination of information, one can quickly 
get to the necessary information in one site by searching for the right node and then 
mining its related concepts. In this example, the user understands the basic relationship 
of the geography concepts, but needed to know the specific instances that had the right 
combination of search criteria. OAKDA provided the traveler with the appropriate 
geography domain content information that allowed him to discover new knowledge that 
is of interest to him. 

In the three examples of using OAKDA to assist in Web searches suggest there 
may be utility in using ontology to discover knowledge of value to users. It can be as 
simple as navigating the class hierarchy of a domain, such as the wine example or as 
complex as learning all the minute relationships of the individuals. The goal of the 
OAKDA application is to assist its users to easily discover knowledge or information that 
is difficult with a Web search engine alone. Furthermore, by representing the ontology 
graphically and reorienting around different nodes, it is easier to grasp the high and low 
level details than an OWL document or even an ontology editor tool like Protege. 
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D. OTHER ONTOLOGY SEARCH APPLICATIONS 

Using ontologies in a search capability is not unique to OAKDA. Ontology's 
semantic richness represented in languages that are XML based, such as RDF, 
OIL+DAML, or OWL, makes it a powerful tool. Specifically there are two application 
of interest, namely OntoSearch ( http://www.ontosearch.org/) and OntoXpl, that are 
relevant to OAKDA. 

OntoSearch is an application that searches for ontologies via the Web. The goal 
of this application is to encourage reuse of knowledge bases represented in ontologies 
and provide a mechani s m for searching for existing ontologies on the Web. The user 
inputs the search word or words and the application finds the matching ontologies 
available on the Internet. The user also selects the "type" of ontology needed, namely 
RDF, RDFS, DAML, or OWL. Similar to OAKDA, OntoSearch also allows graphing 
capabilities of ontologies. OntoSearch is an important application that complements 
OAKDA. The success of the OAKDA depends on the availability of ontologies that 
covers all the various domains of the real world. It needs to build a library of ontologies 
as its database and the OntoSearch application is a useful tool to find those that are 
available on the Web. 

The OntoXpl, Ontology Explorer Tool, is an application developed at the 
Concordia University (Canada) that assists in the exploration of ontologies using Racer as 
an inferencing engine of OWL DL. This application is to complement ontology editors 
and visualizations tools such as Protege, with the emphasis on exploring different levels 
of an ontology. The user choose an ontology and the application models it into eight 
browsing categories, namely "file selector, "natural language" description, structural 
information, exploration of concept/property axioms, inspection of concept and role 
hierarchies, view of statistical information, inspection of A-box graph structure, and the 
interactive use of RACER's query language, nRQE." (Haarslev et al, 2004, 3). 

Although these other ontology-based applications exist, OAKDA is unique in 
storing a database of ontologies and matching the user search term to the appropriate 

ontology. This aids the users to find relevant domain content and design Web searches 
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that will result in the most useful list of resources. Unlike OntoSearch and OntoXpl, 
users need not know anything about ontologies to use the application. OAKDA is an 
end-users' tool rather than a developer’s one. 


E. CONCLUSION 

There may be great value in representing knowledge in a format that can be 
processed by machines, as well as humans. By leveraging the available semantics of 
RDF and OWL, ontologies can model the concepts and relationships of a real world 
domain that systems can "read" and inference based on the rules of logic. Since valid 
ontologies are often difficult to build, there must be incentives for the domain experts to 
construct them. Ontologies can become more powerful as more applications are 
developed to take advantage of their structured knowledge representation. 

OAKDA is an application that uses a library of ontologies to look for the user 
search terms. It hides the details of the OWL constructs and presents the users with an 
interactive graph that helps them discover information that they did not previously know 
but might be useful for them. Its purpose is to add relevant context to user Web searches 
to assist in retrieving the most relevant Web page when using brute force search engines 
such as Yahoo! and Google. 
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V. ARCHITECTURE OF ONTOLOGY AIDED KNOWLEDGE 
DISCOVERY ASSISTANT (OAKDA) APPLICATION 

A. INTRODUCTION 

This chapter describes in detail the software and hardware arehitecture of the 
Ontology Aided Knowledge Diseovery Application (OAKDA) Prototype. As discussed 
in the previous chapters, the objective of OAKDA is to leverage OWL-DL ontologies to 
help end users to improve upon the formulation of their web search query by providing a 
framework and environment to discover terminology that makes their web searehes more 
relevant and precise. 

OAKDA enables the user to augment an initial set of search content with data 
derived from ontology files in one or more domains of interest. With additional search 
terms and better understanding of the domain of interest, users can create queries that are 
both focused and relevant. Results are delivered back to the users through a web seareh 
portal. 


The applieation facilitates the above scenario by providing a framework to: 

1. Pre-seareh a database of ontology files with the user’s initial set of search 
terms to help the user loeate the ontologies that are relevant. 

2. Perform Deseription Logics (DL) inferencing to represent the selected 
ontology in its fullest meaning. 

3. Provide a means to for the user to explore a relevant ontology by 
implementing a graphieal interface that makes navigation between linked 
ontology elements easy and intuitive. 

4. Assist the user in formatting the gathered search content into a web search 
query. 

One way OAKDA can be used to explore an ontology is to move up and down the 
taxonomy of its elasses. Alternatively, the application can traverse an ontology through 
relationships between the instantiations of different elasses. Asserted content in 
ontologies are conneeted by links between instantiated classes known as “Individuals”. 
Individuals in an ontology are related through individual/property relationships. Class 
definitions contain members called properties that serve to deseribe relationships to other 
individuals or eoncrete data types. For example, the class Human may have a property 

called hasChild. The hasChild property speeifies that it will be “filled” by a class called 
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Human. Suppose we instantiate two individuals from the Human elass and call them 
Bill and Mary. If Mary is Bill’s daughter, then we can relate these two individuals 
together through the hasChild property relationship where BilLhasChild Mary. 


The following is a typical use case for OAKDA: 

• The user starts with an initial search term related to the domain of interest. 
The user is assumed to have little acquaintance of the knowledge domain. 

• The user searches all stored OWL-DL data for string matches. 

• The user navigates a chosen OWL-DL ontology to extract related information 
to expand the initial search term. 

• The user searches the web with the new terms discovered from the OWL-DL 
ontology. 

Because the content of an ontology is connected in meaningful ways, the end user 
may discover information that they otherwise might not be aware of if they have a means 
to traverse terminological (class) and asserted (individual, property) relationships. 

An end user may also be able to uncover knowledge domains that contain their 
initial search term but are not contextually correct. Knowing about these knowledge 
domains may also useful for web search because the information can be used to signal the 
search engine to avoid retrieving pages that contain terminology from a domain that is 
not relevant. 

The following sections describe in detail the system elements of OAKDA. 

Section B describes in generic terms, the software architecture framework that OAKDA 
development was based upon. Section C provides a description of each component, its 
attributes and roles in the application. Section D shows in detail how these components 
interact with each other. The conclusion provides some thoughts and lessons learned 
during the process of developing the prototype. 


B, MULTI-TIER APPLICATION VS. SINGLE TIER ARCHITECTURES 

A multi-tier software architecture is a type of client/server architecture whereby 
the logical components of the application are segregated by function into a composition 
of layers, known as “tiers”, that divide up the components of the system. 
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Layers of multi-tier applieations are defined by groupings of program 
funetionality similar to encapsulation in Object Oriented Programming. Each layer 
represents a grouping of logic that possesses a small number of interfaces that can be 
used to send messages. This arrangement creates interconnections with a small number 
of interfaces between the tiers. Because only a few interfaces need to be broken to isolate 
the tiers, they are considered “loosely coupled”. The goal of having loosely coupled 
architectures is to limit the effect that programming changes in one tier will have on other 
tiers. The segregation of logic promotes reuse of components and reduces development 
and maintenance costs to design by limiting the scope and complexity of each tier. 

The multi-tier architecture contrasts sharply with the type of architecture used for 
mainframe computing, now known as single-tier architectures. These software 
applications consisted of a monolithic cluster of code where all function points were in 
scope from any part of the program, i.e. the designers were not constrained from having 
presentation logic make direct calls into the data retrieval logic, etc. The effect of this 
approach was a design where all parts of the program were “tightly coupled” and changes 
to one part of the program would have cascading effects throughout the code. The 
Single-Tier design increases the level of difficulty for software maintainability and ability 
to change to meet new or evolving requirements. 

1. Presentation GUI Tier 

This tier is composed of the software installed on the client side computer that 
displays a graphical user interface (GUI) for the application. The interface could be a 
general purpose program such as a web browser or a specialized program specially built 
to interact with a server. 

2. Presentation Logic Tier 

The presentation logic tier is responsible for provisioning the interface 
information to the client side and performing the steps to assemble the content which 
users will view and possibly interact with. There are three distinct types of presentation 
logic sub-tiers: 

1. Web Tier: The web tier is descriptive of systems using HTTP for messaging 
between the client and server. The content on this tier consists of HTML, 
XML, CSS, and/or JavaScript that is rendered by the client web browser. 
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Programs on this tier are responsible for assembling eontent and getting the 
data from the business and data aeeess tiers. The Web Tier also eonsists of 
the Web Server and Applieation Servers responsible to listen for HTTP 
request messages from multiple clients and for providing a dispatching 
interface to the programs that assemble HTTP responses. All data processing 
for this tier occurs on the server side. 

2. Proxy Tier: Programs involving the “proxy” tier support a distributed 
computing architecture. This tier’s functionality is most often provided by a 
third party’s Web Services, described in more detail below. 

3. Client Interface: Unlike the Web and Proxy tier, this component will execute 
on the client side of the system but its business rules are downloaded from the 
server. It is responsible for rendering custom displays for the end user. 

3, Business Logic Tier 

This tier contains the “business rules” used to perform calculations or transform 
and manipulate data. 

4, Data Access Tier 

This tier includes any components that are used to interact with or access 
information on the Data Tier. Examples of these components are operating system API’s 
used to store data on host file systems or DBMS (Database Management System) access 
API’s such as ODBC (Open Database Connectivity) and JDBC (Java™ Database 
Connectivity). 

5, Data Tier: 

Data that needs to be “remembered” by the application or is used as a driver for 
future processing is stored on this Tier. It is the layer that stores all the application data 
and is an essential part of any Multi-tier application. The data usually resides in a 
database or file system accessible by the server. 

As indicated in the previous section, the intention behind structuring software in 
Multi-Tier Architectures is to enhance the ability of the system to adapt to change or at 
least allow large sections of a project to be reused in new applications. This approach 
was chosen for the OAKDA prototype in order to make it more adaptable to changes 
during its development and lifecycle. 
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C. OAKDA PROTOTYPE MULIT-TIER ARCHITECURE 

Table 7 below shows an overview of the OAKDA eomponents and their 
assoeiated tiers. Additional detail deseribing eaeh OAKDA tier eomponent is provided in 
the seetion below. 


TIER 

OAKDA COMPONENTS 

Presentation GUI 

Web Browser (Microsoft Internet Explorer, Mozilla Fire fox, 
etc) 

Presentation 

Logic 

Web 

Apache Web server 

Tomcat Application server 

Java Servlets 

HTMF 

CSS 

JavaScript 

Proxy 

Google SOAP Web Services 

Client 

Interface 

Java TouchGraph Applet 

Java Http API 

Business Tier 

Java (match algorithm, data transformations, graph data 
construction) 

Racer 

Data Access Tier 

JRacer 

Racer 

JDBC 

Data Tier 

MySql Database 

OWF-DF Ontology Files 


Table 7. OAKDA Multi-Tier Component Matrix 


1. Presentation Tier 

The Presentation Tier eode runs on a Web browser. The Web browser is not 
aetually developed as part of the applieation; rather it is an off-the-shelf elient software 
that is required to interfaee with HTTP messages that OAKDA reeeives and produees. 
Web browsers render Web-based doeument content into an interactive graphical user 
interface. The OAKDA application assumes that a Web browser such as Internet 
Explorer or Mozilla FireFox is available to the end user and that it is configured to 
process CSS, JavaScript, and Sun Java™ Applets. 
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2, Presentation Logic Tier 

The Presentation Logie Tier provides the information that provisions the user 
interfaee on the client side. The OAKDA application uses the following languages and 
components to render a GUI presentation for the end user. 

a. Presentation Logic: Web Tier 

The Web Tier consists of those Presentation Logic components that use 
Internet based languages, technologies and methodologies. 

(1) HTML (Hypertext Markup Language) is a non-proprietary 
subset of the SGML markup language. HTML is treated as a specification by Web 
Browsers for rendering viewable document content and Web forms. The bulk of 
OAKDA screens are rendered in HTML 

(2) CSS (Cascading Style Sheets) is a mechanism for adding 
style to HTML documents. It performs a number of presentation effects, like positioning, 
which serves to augment the HTML specification. CSS is ancillary to the maintainability 
and reusability of Web pages by separating the presentation from the content. 

(3) JavaScript is a scripting language that executes on a thread 
provided by the Web browser’s (client side) process. Web content can be made to be 
more interactive and dynamic using JavaScript. In OAKDA, JavaScript is used most 
often to implement the Web page navigation scheme. 

(4) Apache Web Server - A Web server is a software program 
designed to deliver data, usually in the form of HTTP messages, across a TCP/IP network 
between client and server computers. The sequence of events for client/server 
communication over HTTP is usually as follows: 

• The client sends an HTTP request to the server. The request 
consists of the IP address of the computer running the server, 
the IP of the computer making the request and parameters that 
may specify the type of content the client is requesting. 

• The Server responds with an HTTP response containing the 
requested content, if found. 

The Apache Web Server, obtained from the Apache Software 
Foundation, was the Web Server of choice for the OAKDA project. It is open source and 
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freely available software. The Apaehe Software Foundation is an organization founded 
to faeilitate the development of several software projeets. Sinee 1999, the Apache Web 
Server has been the most visible and popular HTTP server in use on the Internet. 

(5) Apache Tomcat Application Server - A Web server’s main 
function is serving HTML documents to a requesting client. However, when the content 
of those documents is dynamic, some kind of data processing must be accomplished to 
perform calculations or retrieve data necessary to construct the content on the fly. 
Application servers are the components that facilitate the interface between a HTTP 
message handled by the Web server and the invocation of a specific program to perform 
data processing. An application server can be thought of as a type of middleware that 
handles messaging between the Web server and the various back-end applications, such 
as databases or programs that implement business logic in a data processing environment. 

Jakarta, a sub-project of the Apache project, facilitates Java™ 
based open-source projects. The Tomcat application server is one the Apache Jakarta 
projects utilized by OAKDA. Its distinguishing feature is that it enables a special type of 
Java™ class called a Servlet to perform HTTP messaging toward the Web server and it is 
the program entry point for the invocation of Java^M methods utilizing the server side 
components. 

(6) Java Servlets - A servlet is a Java™ class that is built to 
interface with HTTP on an application server. It reads HTTP parameters, manages HTTP 
sessions, and handles other services such as Web site authentication. Output data is sent 
using built-in servlet methods that create the HTTP response messages to be transmitted 
to the client via the Web server. 

b. Presentation Logic: Proxy Tier 

The Proxy Tier is the sub-category of the Presentation Logic that deals 
with the use of third party components in the architecture. 

(1) Google Web Search Service - Web Services are application 

level services which enable inter-process communications between computers across 

Internet or intranet network boundaries. When processes communicate, they require the 

ability to transmit data arguments to “call” a far process and transmit “return” messages 

to a calling program. Web Services provide a framework for transmission of calls and 
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“return” data via XML formatted request (eall) and response (return) doeuments. The 
XML requests and responses are reeeived and dispatehed by Web servers, usually over 
HTTP. XML and Web Serviees act as a middleware that eliminates the need for data 
processing systems to interoperate directly with each other. Since the remote procedure 
calls are performed in a way that is platform non-specific, the programs executing the 
business rules do not need to be compliant with each other's software or hardware 
platforms. In this way, Web services solve compatibility and interoperability problems 
common in other types of network based inter-process communication. 

There are three key components to Web services. 

• Simple Object Access Protocol (SOAP) defines the XML 
grammar for Web services requests and responses. A 
SOAP envelope is the name of the XML document that is 
transmitted as request or response. Inside the envelope for 
a request are the function call names, the parameter list and 
all other information needed for the peer system to make a 
service invocation. SOAP response envelopes are similar 
to requests in that they are a formatted XML documents, 
but they contain the “return” data of the request or fault 
messages if the request could not be processed. 

• The WSDL (Web Service Description Language) 
component serves to describe the specifications for the 
SOAP request and response document formats. 
Analogously, when calling a method of a linked program, 
the programmer needs to know the method name, the 
parameter list, the data types of each parameter and the 
return value. A program that does not properly reference a 
method specification will usually not compile. In the case 
of Web services, the components that interact are separate 
entities with no “awareness” of each other. The WSDL is a 
way for the Web services host to advertise the correct 
specification that should be used to construct SOAP syntax 
for successful interoperability. 

• UDDI (Universal Discovery Description Integration) is an 
internet or intranet facing directory that serves to advertise 
Web service capabilities available for use. The UDDI can 
provide the basic information needed to make contact with 
a Web service host including the download of a specific 
Web service’s WSDL. 
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(2) Google Web Seareh API - Google offers free Web serviees 
client software for personal, non-commercial use. After registering for the service, the 
Web services client is able to send a SOAP message containing a Web search query to 
Google’s Web services host. The host dispatches the query to Google’s internal 
mechanisms for searching its database of indexed Web content and returns a SOAP 
response to client. The response contains the Web page hits generated by the submitted 
query. 

The OAKDA application uses a Web services interface to perform 
Google Web searches. The response SOAP message is parsed and formatted into the 
HTML pages of the OAKDA application. In this way the user gets an end-to-end 
capability when using the application. They would otherwise have to transfer to a Web 
search portal when they want to perform their Web searches. 

c. Presentation Logic: Client Interface 

The Client Interface is the sub-category of the Presentation Logic Tier that 
provisions a user interface which executes on the client but whose business logic is 
downloaded from the server. 

(1) Applets are programs written in the Java^M programming 
language that can be embedded in an HTML page, in the same way that image files can 
be included. Java™ applets are executed in the client Web browsers Java™ Runtime 
Environment (JRE) and are subject to restrictive policies, for security reasons, that 
prevent the running program from accessing the local machine's file system or from 
making network connection to any computer other than the program's originating host. 
The policy is meant to prevent malicious programmers from harming the client side 
computer. 

The OAKDA application creates a client request to download an 
applet from the OAKDA Web server. The applet has add-ins that allow it to 
communicate with the Web server and maintain HTTP sessions. 

(2) TouchGraphi2, created by Alexander Shapiro in 2001, is an 
open-source user interface software used for information visualization and data modeling. 

12 http://www.TouchGraph.com 
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TouchGraph is commonly used for visualization of data points that can be represented in 
directed graph format. TouchGraph was extensively modified for use with the OAKDA 
application because it did not have a native interface for input of the data representing the 
directed graph. 

The modifications enabled the software to accept the information 
regarding nodes and the connections between them. The software renders the 
information in a two-dimensional display interface and simulates “virtual physics” called 
“Spring-Layout’ which serves to spread the data evenly on the presentation screen area. 
Nodes connected by edges will attract each other while all other nodes repel one another. 
At short distances, the repulsion “force” of the nodes is stronger than the attraction force 
of nodes connected by the edges. The result of this is to cause all nodes to self-organize 
in a way where they will not be concealed by other nodes. Also, nodes that are part of a 
strongly connected section of the graph will tend to cluster together while weakly 
connected parts separate. 

The software also provides functionality to manipulate the graph 
data on the screen. The nodes can be repositioned by using “drag and drop” mouse 
commands, and can be rotated in the viewable area. The interface also has slider controls 
that change the intensity of the node repulsion force that is compressing or elongating the 
edge lengths in the graph. This is useful when the graph is crowded by a large amount of 
nodes because the expansion will increase the readability, but at the expense of the 
number of nodes that can fit in the viewable field. The software also has the capability to 
hide whole sections of the graph so it can be scaled down to a manageable size. To 
restore the hidden nodes, the user clicks on an indicator icon that re-expands the graph. 

In the OAKDA application, TouchGraph executes on a client-side 
applet. It is used to render only a portion of the user selected ontology. The nodes are 
represented by the ontology names for class, property, and individual data. TouchGraph 
directional edges connect the nodal information to show the inheritance relationships and 
differences in color gradation further underscore direction of inheritance between classes. 
All classes are depicted as blue boxes while a darker shade of blue is meant to indicate 
that the class is a parent of the lighter shaded class node. 
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(3) BOX Java™ Package - The BOX Paekage is elient-side 
software used for HTTP communieation. The paekage eontains programs that have the 
eapability to manage session and authentieation between Web elient and server. Jeff 
Heaton, who ereated software robots ealled “hots” able to automatieally traverse linked 
Web sites and perform data proeessing on their eontent, authored the paekage. The 
“hots” require the ability to manage session and authentieation to suoeessfully aeeess 
Web site information. 

The HTTP Java™ elass eontained in the BOX paekage performs 
some of the same functions as a Web browser such as Microsoft’s Internet Explorer or 
the Mozilla Firefox. The important contribution of the BOX package to OAKDA is the 
management of HTTP session information. A session is the sum of all HTTP 
eommunieations between a elient and the Web server. The HTTP protoeol on its own 
only manages eommunieations where eaeh request and response pair is independent from 
the all others. Henee, HTTP is eonsidered to be a “stateless” protoeol. It is important for 
the Web applieation to be able to assoeiate the series of request/response pairs in order to 
eorreetly differentiate between the requests and responses of multiple sessions. Without 
this ability, a response could get assoeiated to the wrong elient, and misdirect the end 
user to an ineorreet Web page. 

Heaton’s HTTP Java™ elass has methods to support cookies, 
whieh are messages sent from the Web server and stored on the elient’s host maehine. 
During subsequent eommunieations, the identifier information in the eookie is sent back 
from the elient to enable the Web server to traek a partieular session and the events that 
the session registered. Heaton’s HTTP software is also used by OAKDA to enable HTTP 
sessions between the TouehGraph applet and the Web server. 

3, Business Tier 

The Business Tier in a Multi-Tier Arehiteeture eontains the eomponent programs 
that perform ealculations or data transformations. 
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a. Racer Server 

Racer is a data processing server for knowledge representation systems 
and description logics. Racer was developed by Ralf Molleris, Volker Haarslevi^, and 
Michael Wesselis. Knowledge representation, a field of Artificial Intelligence, focuses 
on the design of formalisms that are both epistemologically and computationally 
adequate for expressing knowledge about a particular domain. (Baader et al, 2003, xiii) 
Description logics is a framework for representing knowledge in a form of individuals, 
classes of individuals and property relationships that describe them. 

Racer acts upon terminological and asserted aspects of ontology data. In 
OAKDA, it is architecturally situated as a middleware between ontology markup files, 
such as OWL-DL, and the programs that need to access or modify them. Racer provides 
reasoning or inferencing as a central service. The algorithms underpinning Racer’s 
reasoning engine guarantee "correctness", "completeness" and "decidability." 

Correctness means that no false conclusions are drawn. Completeness implies that all 
correct conclusions are present and decidability means that there exists a terminating 
program that can complete reasoning. Reasoning allows one to implicitly infer 
represented knowledge from the explicit knowledge contained in the knowledge base. 
(Baader et ah, 2003, 43) As an example, suppose it is explicitly stated that X is to the left 
of Y and that Z is to the left of X. A reasoning engine infers that Z is also to the left of Y 
by the rule of the transitive property, “to the left of’. Racer also provides services to 
either publish to or subscribe (query) from an existing knowledge base. 

Racer divides ontology content in terms of A-Box and T-Box reasoning 
where the A-Box is the set of asserted relationships between individuals and classes or 
between individuals and other individuals. The T-Box is the set of classes and properties 
of classes arranged in a hierarchical inheritance relationship that describe the schema of 
the domain. The classes contain properties that further define them. The “A” in A-box 


13 University of Hamburg, Germany 

14 Coneordia University, Montreal Canada 
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refers to "asserted" eontent, while the “T” in T-box refers to "terminology" and the 
relationships within the T-Box whieh resemble a sehema. 

The following is a list of serviees that Raeer has available to the end user. 
For T-Boxes, Raeer ean answer the following queries: 

1. Class eonsisteney with regard to a T-Box: Is the set of objeets 
deseribed by a Class empty? 

2. Class subsumption with regard to a T-Box: Is there a subset 
relationship between the set of objeets deseribed by two elasses? 

3. Find all ineonsistent elasses mentioned in a T-Box. Inconsistent 
classes might be the result of modeling errors. 

4. Determine the parents and children of a class with regard to a T-Box: 
The parents of a class are the most specific class names mentioned in a 
T-Box which subsumes the class. The children of a class are the most 
general class names mentioned in a T-Box that the class subsumes. 

For A-Boxes, Racer can answer the following queries: 

1. Check the consistency of an A-Box with regard to a T-Box: Are the 
restrictions given in an A-Box with regard to a T-Box too strong, i.e., 
do they contradict each other? Other queries are only possible with 
regard to consistent A-Boxes. 

2. Instance testing with regard to an A-Box and a T-Box: Is the object for 
which an individual stands a member of the set of objects described by 
a certain query class? The individual is then called an instance of the 
query class. 

3. Instance retrieval with regard to an A-Box and a T-Box: Find all 
individuals from an A-Box such that the objects they stand for can be 
proven to be a member of a set of objects described by a certain query 
class. 

4. Computation of the direct types of an individual with regard to an A- 
Box and a T-Box: Find the most specific class names from a T-Box of 
which a given individual is an instance. 

5. Computation of the fillers of a property with reference to an 
individual. Check if certain concrete domains constraints are entailed 
by an A-Box and a T-Box use. 

In the context of the OAKDA application, Racer’s primary functions are 

for query and inferencing of the OWL-DL data. Racer is instructed to upload the set of 

OWL-DL relations into its memory and deduce any extra information present through its 

inference functions. Once the ontology is present in Racer’s memory, OAKDA initiates 

queries that deliver the nodal relationships portrayed in the graphical user interface. An 

example of a query is “retrieve all direct subclasses of Class “X”. A formatted query in 
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Racer’s syntax is sent to the server, where proeessing takes plaee and the answer is 
delivered back to the souree address of the requester. 

Raeer’s implementation language is COMMONLISP. Raeer 
eommunieates via the eommand line or with a TCP/IP network interfaee. It listens for 
TCP/IP requests messages and issues responses eontaining the requested information 
over the same ehannel. It can be eonfigured to handle multiple simultaneous users. 

Sinee no “publish” methods are used in the OAKDA system, there is no need for Racer to 
maintain session state for each user. Racer is aeeessed by OAKDA programs through the 
user of the Java^^ JRaeer API, whieh abstraets Racer funetion ealls into a format eallable 
by Java™ programs. 

b. Ontology Search matching algorithm 

In the OAKDA applieation, the end user searehes a database table 
eontaining the indexed eontent for all OWL-DL files loaded in the system. The indexed 
ontology data are aceessible through JDBC ealls to MySql database that stores the 
information. Substring matehes are pulled from the database using the SQL “LIKE” 
eommand. A substring mateh oeeurs when one string partially or fully matehes another 
string. "Regular Expressions" are used in later proeessing to formulate a mateh eloseness 
seore. A Regular Expression is a pattern of eharaeters that deseribes a set of strings. 
OAKDA uses a regular expression eonstruet provisioned by the Java™ “String” elass. 

When a substring is found, a seore is eomputed that rates the "match 
closeness" (MC). This works as follows: the Query String (QS) is matehed against the 
Indexed Ontology Element (lOE) string. The string Eength Differenee (ED) between QS 
string length and lOE string length is eomputed. If QS matehes with lOE, the MC seore 
is ealculated as follows: 

MC := 1 - ( LD lOE string length ) ; 

Eor example, suppose QS and lOE are given as SENT and PRESENTER. Since SENT is 
a substring of PRESENTER, the mateh seore will be eomputed as: 

LD := 5=9-4 

lOE string length := 9 

MC:=0.44=1- (519) 
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If QS and lOE are given as PRESENT and PRESENTER the score will be 0.77, which is 
correctly ranked higher than the previous query. Each indexed node is given a score 
between 0 and 1 to rank its relevance to the match. 

Users performing a search are not limited to a single search term. Since 
the indexed ontology element often is a phrase, more than one term can apply when 
searching the database. OAKDA’s programs will parse an element named 
TropicalHumidClimate as TROPICAL HUMID CLIMATE. A ranking algorithm 
was designed to handle cases where the closeness of match across a phrase needed to be 
computed. 


Consider an example where the user query string is TROPIC HUMID and 
the indexed ontology element is TROPI CAL HUMID CLIMATE. The overall match 
ranking will be calculated in the following way and as shown in Table 8. 



lOE 

QS 

TROPICAL 

HUMID 

CLIMATE 

TROPIC 

0.75 

0.0 

0.0 

HUMID 

0.0 

1.0 

0.0 

Table 8. ] 

Preliminary String Match Matrix 


1. The best match will be computed for each word. The score 
computation matrix for individual string matches is shown in Table 2. 

2. The highest scores (in bold) will be used in the overall computation. 

3. Any lOE term that did not have a positive score will be assigned the 
average score for the all terms. 

Average= 0.58 = (0.75 + 1.0 + 0.0 ) L 3 

4. The total score is the product of all scores: 

Score = 0.44 = 0.75 * 1.0 * 0.58 

The intention behind ranking the match similar to the ranking of page hits 
in a Web search. The ontology content with the highest score will have the closest match 
to the given search terms. The closer the score is to one, the higher ranking in the 
displayed list. This algorithm is crude but fairly effective. Euture enhancements would 
include methods similar to those used by Web search portals. 
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c. Ontology Batch Loader 

Since users benefit when there is an ontology in OAKDA’s repository that 
eontains the knowledge domain of their interest, the application contains functionality 
that enables end users to make eontributions to the database. OAKDA enables users to 
submit ontology data for other users. OAKDA provides a webpage that allows end users 
to input the URL of an OWL-DL ontology that is hosted on the Web. When the URL is 
provided and the end user elieks the submit button, the OWL-DL downloads to 
OAKDA’s host server. Bateh loader programs eall Raeer functions that read and parse 
the class, individual and property nodes contained in the ontology and load them into 
OAKDA’s internal database. 

4. Data Access Tier 

The Data Aeeess Tier eomprises programs in the systems that are used to 
interface with the Data Tier. 

a. RICE JRacer API 

The JRaeer API is a package of Java'^’^ elasses with capability to format 
and invoke, via TCP/IP soekets, request messages formatted as COMMONLISP Raeer 
Server commands. The API also can listen for Raeer’s TCP/IP response and bind the 
response to Java™ objeets. The methods of the API mirror those of the Raeer server 
exeept that JRaeer invoeations ean be defined in terms of Java^^ syntax and use Java™ 
typed objeets. 

JRacer was developed as a part of an ontology visualization projeet ealled 
RICE 16 (Raeer Interaetive Client Environment), built by Ronald Cornet of the University 
of Amsterdam. In ereating the JRaeer interfaee. Comet provides an easy to use, reusable 
and doeumented interfaee between Raeer and Java™ programs. 

b. JDBC 

Java Database Conneetivity (JDBC) is a Java API built into DBMS 
systems whieh allows Java programs to mn SQE statements. 17 In OAKDA, Java 


16 http ://www.b 1 g-systems.com/ronald/rice/ 

17 ODBC is another well known API and is parallel to JDBC, but it is language-independent. 
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programs use JDBC to execute the SQL that queries and updates data stored in its MySql 
Database RDBMS. 

5. Data Tier 

The Data Tier is the layer of a Multi-Tier Architecture that stores the application data. 
a. MySql Database 

MySqlis is a freei9 multi-user database server. It is ideal to use a database 
server in a Web-based project because several end users may use the site simultaneously. 
MySql supports the SQL query language used to query and update data stored in 
relational tables. MySql also supports TCP/IP connectivity. Unlike some other popular 
database servers, MySql does not support transactional processing or referential integrity 
in its tables. The section below describes two of OAKDA’s key database tables, 
outlining their structure and functionality. 

(1) Indexed OWL-DL Content Table - This table contains an 
indexed version of OWL-DL ontology documents loaded into the OAKDA system. The 
information contained in the OWL-DL flat file are read, parsed, vetted and processed by 
the Racer server and loaded into the MySql index table by Java programs using JDBC 
and SQL Data Manipulation Language (DML) statements. Table loading occurs when 
the system administrator or an end user of the site wishes to add OWL-DL content to the 
OAKDA data store. 

Each term contained in the ontology is referenced in a MySql 
database table that has the following columns: NODE TEXT, NODE TYPE, 
NAMESPACE and OWL EIEE. The matrix shown in Table 9 describes the Ontology 
Index Table. 


18 http://www.mysql.com/ 

19 General Public License (GNU) 
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Field Name 

Data Type 

Description 

NODETEXT 

String 

The name of the OWL-DL term 

NODETYPE 

String 

Describes the OWL-DL type of the named term 

(Class, Property, Individual, Role, etc) 

NAMESPACE 

String 

The URI of the OWL-DL schema 

OWLEILE 

String 

The name of the OWL-DL fiat file 

DATEINDEXED 

Date 

The date on which the file was loaded into OAKDA 

Table 9. M 

[etadata Descriptions for Indexed OWL-DL Content Table 


Table 10 shows an example data that would be stored in the 
“Indexed OWL-DL Content Table.” 


NODE_TEXT 

NODE_TYPE 

NAMESPACE 

OWL_FILE 

DATEJNDEXED 

Person 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

GrandMother 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

Grandparent 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

MaleSex 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

GrandFather 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

Parent 

class 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

Gemma 

Individual 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

Peter 

individual 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 

Matt 

individual 

http://www.owl-ontologies.com/generations.owl 

generations.owl 

20051112 


Table 10. Example data for Indexed OWL-DL Content Table 


The data in Table 10 is a transformation of the OWL-DL content. 
The reason this is done in advance is to enable quick lookups during the term searching 
portion of the program. This can be done by Racer but it would be prohibitively slow and 
resource intensive if Racer was used to read and perform reasoning of all OWL-DL file 
content each time search is requested. It is far less memory and CPU intensive to search 
an indexed list since no reasoning is necessary for term search. 

(2) Selected Search Term Table - This table stores the items 
selected by the end user during terminology discovery which is later applied to the Web 
search. Since the applet is running in a separate HTTP session, the user selections and 
HTTP session ID must be stored for later retrieval by the application. The following 
matrix, in Table 11, shows the description of the Selected Search Term table. 
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Field Name 

Data Type 

Description 

SESSIONID 

String 

The name of the HTTP session that launehed the 
TouehGraph applet. 

SEEECTED NODE 

String 

The text of the OWL element node 

DATE SEEECTED 

Date 

The date/time when the item was saved in the table 

Table 11. 1 

VIetadata Deseriptions for Seleeted Seareh Term Table 


b. OWL-DL Ontology File 

Web Ontology Files eontain the representational data used by OAKDA to 
find eonneeted terminology information. In OAKDA, only OWL-DL files are proeessed 
by the RACER engine. 

6. OAKDA Client and Server Components 

Table 12 delineates, from a physieal standpoint, where eaeh eomponent in 
OAKDA’s Arehiteeture resides. This is intended to provide a quiek summary of the 
physieal loeation of eaeh eomponent of the system. The “System Component” eolumn of 
the table lists eaeh part of the OAKDA system. The “Client” eolumn is the eomputer 
used by the end user of OAKDA to aeeess the system. The "OAKDA Host” eolumn is 
the eomputer that is the server for the system. The “3"^'* Party Server” eolumn eontains 
those eomputer resourees that are leveraged by OAKDA via the Web. 


OAKDA Components 

System Component 

Client 

OAKDA 

Host 

3*^** Party 
Server 

Apaehe Tomeat App Server 


V 


Apaehe Web Server 


V 


Google 



V 

Google Web Serviees 


V 

V 

HTML / CSS / JavaSeript 

V 



Java™ Objeets 

V 

V 


Java^M Applet 

V 



Java™ Servlet 


V 


OWL-DL Ontology Pile 


V 


Raeer Server 


V 


TouehGraph 

V 



Web Browser 




MySql Database 


V 



Table 12. OAKDA Component Physieal Loeation Matrix 
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D. OAKDA PROTOTYPE PROCESS FLOW 

In the previous section, the components of OAKDA were introduced individually. 
This section explains how OAKDA components interact and details the sequence of the 
interaction. The diagram in Figure 78 provides a synopsis of all the communication 
pathways of OAKDA. The diagram is divided into three sections to indicate network 
boundaries: the Server Side, where the OAKDA system is hosted, the Client Side, where 
the client computer and the end user of OAKDA reside, and the Internet, where Google'^'^ 
web services are located. All interactions traversing these boundaries are TCP/IP based 
network messages. There are three different modes of communication between the 
software and hardware system components: TCP/IP, direct program calls between Java™ 
programs, and System I/O 20 and disk I/O 21 . 



Figure 78. OAKDA Component Messaging Pathways 


20 These are standard input and output from the elient eomputer, sueh as video, mouse elieks and 
keystrokes. 


21 For Raeer's read of the OWL-DL fdes. 
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1, Anatomy of an OAKDA Search 

In order to illustrate the complete workings of OAKDA, shown in Figure 1, this 
section describes a typical OAKDA usage scenario. This description touches upon all the 
components shown in Figure 78 and provides an overview of all interactions and 
processes underlying the system. 


Firstly, it is assumed OAKDA will be used to refine Web search or to discover 
knowledge about a domain. We will suppose a user exists who has a search topic of 
interest in cartoons and an initial search term: "Jerry." The user launches the OAKDA 
home page to begin the process. Figure 79 shows that this HTTP requests/response 
communications is initiated with the Apache Tomcat Web server. The home page is 
composed of static HTML content. Figure 80 shows the OAKDA home page. 



Figure 79. OAKDA Client / Web Server Interaction 
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3 Ontolgy Aided Knowledge Discovery Application • Microsoft Internet Explorer provided by Comcast 


Eile Ed* Fflvorites lools tieip 

• O L»1 ^ y Search Favorites e • ^ ^ ^ £i 


IJMI 



Figure 80. OAKDA Home Page 


Next, the user enters the search term, “Jerry”, into the client side HTML form and 
presses the submit button. This action generates a HTTP request containing the search 
term as an argument to the Web server which is then dispatched to the Tomcat 
application server. Tomcat instantiates a program event that is handled by a specific Java 
servlet. The event invokes a Java program to create a SQL string, with search term 
“Jerry” incorporated, to query the MySql Database. Messages between Java and MySql 
are accomplished via the JDBC API. The database table queried contains records of 
indexed ontology content. The query returns the list of ontologies where the search term 
is found. The query results are passed back to the Java servlets where they are ranked by 
match closeness, incorporated into a HTML document and sent to the client via HTTP by 
the Web server. In this case, the term “Jerry” was found in the OAKDA database in an 
ontology called “Cartoon Star”. This interaction is depicted below in Figure 81. 
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Figure 81. Client interaction with OAKDA Database 


At this point the client presentation is showing a ranked list of found ontology 
content and a links to the source OWL-DL files where each match was found. The user 
selects the “Cartoon Star” ontology by pressing a button next to each result row. This 
action sends HTTP messages to Tomcat which are dispatched to Java servlets that create 
an HTML document to be sent as a response back to the client browser. HTML 
<APPLET> tags in the response direct the browser to download and instantiate a Java 
applet running on the browser’s Java Virtual Machine22 (JVM). The Applet is used to 
render OWL-DL data in the form of a directed graph for the user interface using 
TouchGraph software. The first action the applet does is to send an HTTP request 
message to the Web server which in turn accesses Racer via the JRacer middleware. 
Racer is instructed to read the “cartoon star.owl” file and query it for all nodes directly 
associated to “Jerry.” Description logics based reasoning executes as the ontology is 
22 A component of the Java platform which executes Java Bytecode. 
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instantiated in Racer’s memory. [See appendices 1, 2 for sample code^ A list describing 
all ontology information connected to “Jerry” is passed back to the Java Servlet which 
instantiates a Java bean object to be used as a container for this information. The Java 
bean is serialized23 and sent, via the Web server, back to the client applet as HTTP data. 
The Applet then un-serializes the data back into a Java bean object. This object is passed 
to the TouchGraph program and used to render the nodes and edges of the directed graph 
depicting the OWL-DL ontology on the end user’s client presentation. This chain of 
communications is shown in Figure 82. 



Figure 82. Client Applet Interaction with Racer Server 


23 Serialization is the proeess of saving an objeet onto a storage medium in order to later be able to re- 
ereate an object that is identical in its internal state to the original. 


134 












































































3 TouchGraph GraphLayout - Microsoft Internet Explorer provided by Comcast 


File Edit Wew Favorites Tools Hetp 

©Back - ^ ^ ^ s«ch €); '-, til - £l 


Address http://Vxa>)ost/OA>IiA/sefvlet/powers.servlets.TouchGfapb?rwJeftiame»%70<tp%3A%2F%2Fa.com%2Fof)totoqy%23>rry%7Carx>deType«indi^^ Go 



Figure 83. Graphical Visualization of the Cartoon Star Ontology 


The client interface is now displaying a directed graph showing all the ontology 
nodes connected to “Jerry” in the Cartoon Star ontology (Figure 83). Some of the data in 
the display shows that “Jerry” is an Individual node instantiated from the 
“Disney Mouse” class and that it is related to another individual named “Tom” through 
the “In Same Cartoon Series” property. The user initiates an event to re-orient the 
graph around another node. By right clicking the individual, “Tom”, and selecting the 
“orient about node” option, the applet sends an HTTP message to the Web server that in 
turn dispatches an event to a Java servlet which, via JRacer middleware, queries Racer 
for all the OWL-DL data directly related to the “Tom” node. The data is incorporated 
into a Java bean, serialized and sent back to the client applet. The applet then re-displays 
a new directed graph centered on the individual, “Tom.” Some of the changes to the 
screen now show that “Tom” was instantiated from the “Disney Cat” class, but the 
“Jerry” node is still present since it is related to “Tom” by the 
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“In_Same_Cartoon_Series” property. The eomponents used for this event sequence are 
represented in Figure 83. 

When the end user discovers the terms needed for the Web search in the displayed 
graph, the term is selected by right clicking on the node and choosing the “add as search 
term” option on the “Tom”, “Jerry” and “Disney_Cat” nodes. This event causes the 
applet to send an HTTP message to the Web server dispatching a Java servlet to create a 
SQL “insert” statement containing the selected ontology values along with the session ID 
of the current HTTP session. The session ID is later used to retrieve the entire set of user 
selected search terms that are stored during that session. The servlet executes the SQL 
statement using JDBC API in the MySql database. The end result is a record added to a 
database table used to store user selected search terms. These component interactions are 
shown in Figure 84. 
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When the user has chosen search terms and pressed the button labeled “Search the 
Web” on the applet presentation view, an HTTP request is sent to the Web server which 
dispatches a servlet to create a SQL statement that accesses all the stored search terms 
chosen during the current Web server session. The servlet executes the query, retrieving 
“Tom”, “Jerry” and “Disney Mouse” from the database. This data is incorporated into an 
HTML document that is sent back to the client browser for display. The applet running 
in the browser’s JVM is terminated. The presentation now consists of the list of selected 
search terms in an HTML form designed to allow the end user to edit or modify the 
content. This action uses OAKDA components shown in Figure 81. 

The user is now ready to initiate the Web search. By pressing the “Submit 
Query” button on the client presentation, HTTP messages are sent to the Web server 
which dispatches a Java servlet to incorporate the message data into a SOAP envelope 
destined for Google’s Web Services portal. The Web server routes the SOAP message 
request and receives Google’s SOAP HTTP response. Contained in the response is the 
listing of Web search hits that Google found in its database. This information is 
integrated into an HTML document and sent back to the client browser by HTTP. This 
interaction is shown in Figure 85 below. The OAKDA screen shot of the web links 
retrieved by Google Web services is depicted in Figure 86. 
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Figure 85. Client Interaction with Google Web Services 


3 Search Results • Microsoft Internet Explorer provided by Comcast 




Eite gdt ^ Favorires loots UeJp 

Qb«* . q l 3 y €> ^ £l •'iJ 

Address http:/flocalx»st/OA>XiA/sefviet/povws.scyvtets.WobSeafch>pageStartH36*extCM)IsrCY-M><CllSEaTadfcO-andM:extl-JERRY8fadiol»arid6*ext2-TOM6ya^ v* Go 



1 ^- 


O.A.K.D.A. 


Web Search Results 
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Tom and Jerry (MGM) - Wikipedia, the free enryiriooedia 
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Respectful Insolence. Tom and Jenv A nefarious Jewish plot 
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sarcastically inform the good Professor, was a Hanna-Barbera cartoon ... 
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Before the Cat and Mouse Van Beuren's Tom and Jerry 

Almost a decade before MGM’s famed cat-and-mouse team. Amadee J. Van Beuren's 
New York studio had another pair of stars named Tom and Jerry ... 
http //WWW cartoonresearch comAomjerry/ - 5k 


MEMRI TV 

They like it very much, and so do adults - Tom and Jerry [ ] Some say that 
this creation by Watt Disney will be remembered forever ... 
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Figure 86. OAKDA Web Search Results 
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E, OAKDA PROTOTYPE EVENT SEQUENCE AND PROCESS FLOW 

This section elaborates on the previous section and provides Sequence diagrams 
of the interactions of OAKDA processes when initiated by client generated events. A 
sequence diagram is a UML graphical construct used to show the sequence of 
interactions between object instances in a software based system. A sequence diagram is 
specific, referencing the method name of the function called in the interaction. The 
specific type of sequence diagram is a useful tool for a programmer for implementing a 
UML specification. For the purposes of this thesis, a more general type of sequence 
diagram will be used, called a Service Level Sequence diagram, which shows logic in 
detail but does not reference specific program function points. Following the diagrams 
and other depictions show a clear picture of the operation of the system. 

1. OAKDA UML Sequences 
a. Home Page 

This section describes the events and programs that launch OAKDA’s 


Home Page, the first step taken by the end user of the system. Figure 87 shows the 
associated sequence diagram. 



Figure 87. Home Page Sequence Diagram 
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In the starting state for this sequenee, the client has not yet invoked 
OAKDA. The necessary preconditions are the end user has an Internet connection and 
Web browser. The sequence initiates when the end user launches a Web browser and 
enters the Web address for OAKDA. The browser sends a HTTP request to the OAKDA 
Web server. The Web server deciphers the request and sends the contents of the 
Index.html document back to the client as an HTTP response. The browser receives the 
response and renders the HTML to show the OAKDA home page on the client Web 
browser. 

b. Ontology Search 

This section describes the system events and programs that are activated 
when the end user initiates a search of the indexed OWL-DL content stored in the 
OAKDA database. The end result of the action yields a display of the search results in a 
ranked list. Figure 88 below shows the associated sequence diagram. 


Service Level Sequence Diagram: 

Ontology Search 

Action: search ontology database for search terms 


:Ciient End User 


iCiient Browser 

- 1 -' 


Tomcat Server 
- 1 -' 


;Java Servlets 
-1-' 


:MvSal DB Server 

1 


User Action: clickform 
"submit" button 


submit: HTTP 
Request (search- 
term) 


I search (search-term) 


JDBC(SQL) 


SQL result Set 
Build HTML page 


User presented with 
\ Knowledge Base Search 
results page 


HTJP_R®sP9n_s_® 


HTML 


Figure 88. Ontology Search Sequence Diagram 


In the starting state, the OAKDA home page is displayed. The home page 
consists of a form used to search the indexed OWL-DL content. There are no other 
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necessary preconditions. The process initiates when the end user enters search terms into 
the form and clicks the submit button. 

Next, Tomcat Server receives a HTTP request and dispatches a call to a 
Java Program to compile and execute a SQL query against the data in the 
OWL_NODE_INDEX table in the MySql database. The query looks for partial or full 
string matches. Eor each record in the result set of the query, a score is computed that is 
later used to rank the closeness of the match with other rows. The data from the search is 
sorted by rank and formatted into the HTML content on the Search Results Page 
displayed on the client browser. 

c. Ontology Search Results Page 

This section describes the events and programs that construct the applet 
interface used for ontology navigation and terminology discovery when an OWL-DL file 
is selected. Eigure 89 shows the associated sequence diagram. 


Service Level Sequence Diagram: 

Ontology Search Results Page 

Action: Starting TouchGraph Applet 



Eigure 89. Ontology Search Results Page Sequence Diagram 
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In the starting state for this sequence, the Ontology Search Results Page is 
displayed. The page shows a sorted list of ontology nodes resulting from the Ontology 
Search Page. The list of the matched content from the search is ranked by the closeness 
of the match. The columns in the table are: 


a. Score - A real number between 0 and 1 that describes the closeness of the 
match. This list is sorted in ascending order. 

b. Element Name - The OWL-DL element name of the node in OWL-DL 
content. 

c. Type - The element’s OWL-DL type can consist of a CLASS, 
PROPERTY or INDIVIDUAL. 

d. Lile - The name of the OWL-DL file where the element name was found. 


The sequence starts when the user selects one of the buttons adjacent to a 
list item. At the end of the sequence the user is presented with graphical rendering of the 
ontology data in the form of a directed graph showing all nodes closely linked to the 
selected search term. 

Lirst, the element name, OWL-DL type, and OWL-DL file are passed as 
HTTP arguments to the Tomcat application server. Tomcat calls a servlet that builds an 
HTML response containing a <APPLET> tag reference to a Java™ program with the 
HTTP session ID and the OWL-DL arguments embedded in the page. The HTTP 
response is sent back to the client side browser. In rendering the HTML, the client 
browser is instructed to download the Java^M applet code from the Web server. The 
applet establishes a second HTTP session, sending the OWL-DL arguments to the 
application server. 

Tomcat receives this message and responds by invoking a Java program 
that uses the JRacer API to communicate with the Racer server. The OWL-DL element 
names are used as arguments in the invocation. The first action performed is to check if 
the referenced OWL-DL file has already been loaded in the Racer server’s memory. If it 
is not, a call is made requesting Racer to read the OWL-DL file. Racer’s internal 
programs cause it to load, parse and execute description reasoning algorithms. 

After the program returns a message that the Racer has successfully 
loaded the OWL-DL markup file, the servlet program performs a series of calls designed 
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to query Racer for elements directly related to the OWL-DL element name in the query 
statement. There are three types of queries that can be made. These are differentiated by 
their element type which can be one of the following: CLASS, PROPERTY, or 
INDIVIDUAL. This is shown in Figure 90, 91 and 92, respectively. 


Query Type: Class Class marentt 


Class 

-»-’ 


Individual (direct) 



Class (Child) 



Figure 90. OWL Elements Directly Related to CFASS Type 

When the OWL-DF argument of the query is a CFASS type, as in Figure 
90, the servlet will query Racer for parent classes, sibling classes, child classes and direct 
instances. The query element “Class” is the central element of the graph. Arrows 
showing the inheritance relationship by their direction, link related elements. These 
queries only vary by the OWL-DF “type” of the argument. 

Query Type: Individual 

CISSS (direct t ype) [^ | 

Individual (sibling) 

[ Individual ) 

Property 

Individual ^ang^_ _ || 

Figure 91. OWL Elements Directly Related to INDIVIDUAF Type 
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Query Type: Property 

Indiyidual (domain) 

Class (belongs 


Individual (range) 

Figure 92. OWL Elements Directly Related to PROPERTY Type 

Eigure 91 and 92 depict the elements directly related to queries on the 
OWE-DE INDIVIDUAL and PROPERTY type, respectively. The query output is 
arranged into list that captures the pair wise relations of the element names. This list is 
contained in a Java™ bean24 object and serialized25 to “freeze” the object in its current 
state. 

Tomcat routs the serialized output back to the client applet via HTTP. The 
client applet reconstitutes the data into a Java^M bean with the same state it had on the 
server side. The bean is passed as an argument into TouchGraph program methods that 
render a graphical representation of the relation pairs in the user interface. The element 
names become the labeled nodes in the two-dimensional graph and the inheritance 
relationship of the related nodes determines the arrow direction of the edge that connects 
them. The user interface displayed at this point is used for ontology exploration and 
selection of node names for a later Web search. 

d. Ontology Exploration Applet GUI 

This section describes the events and programs that enable the end user to 
traverse an ontology displayed by the TouchGraph Applet. When the user selects a new 
node as the focal point of the ontology, the graph reorients about that node. Eigure 93 
shows the associated sequence diagram. 

24 A Java Bean is simple Java Class that has “set” and “get” methods for each of its properties. 

25 “Flattening” a Java object into a persistent format such as a file or stream object. 
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Figure 93. Ontology Exploration Function Sequence Diagram 


The starting state is for the applet to display a portion of the ontology in 
the TouchGraph applet interface. When the user double-clicks on a given node, or right- 
clicks on a node and selects “orient about node”, the applet will redisplay with the 
selected node as the focal point of the graph. 

When the listener in the TouchGraph applet detects the mouse event, it 
responds by sending a HTTP message to Apache/Tomcat server. The message has HTTP 
parameters which specify the element name selected, the element type, and the OWL-DL 
file to which it belongs. Tomcat detects the messages and invokes Java™ programs 
using the JRacer API to communicate with the Racer server. The program creates 
TCP/IP calls to query the Racer server for all OWL-DL elements related to the new node. 
Just like in the section above, the information is passed down to the client applet and 
redisplayed on the user interface. 

e. Ontology Node Selection 

This section describes the events and programs that enable the end user to 
select a term from the displayed ontology to be used for Internet search. Figure 94 shows 
the associated sequence diagram. 
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Figure 94. Ontology Node Selection for Search Sequence Diagram 


In the starting state, the TouchGraph applet is displaying on the client 
GUI. There are no other preconditions for this sequence. The initiating action starts 
when a user right-clicks on a node and selects “add as search term.” At the ending state, 
the selected term is stored in the OAKDA database with a reference to the HTTP session 
ID of the HTTP session that launched the applet and the TouchGraph applet displays the 
terms in a list box. 

When a listener on the client side applet detects the mouse event, HTTP 
messages are sent to the Tomcat server. Tomcat then dispatches Java™ programs which 
invoke a SQL INSERT statement via JDBC to the MySql database server. The SQL 
statement directs the database to store the selected term and its HTTP session ID to the 
SELECTED SEARCH TERM database table. 

/. Search Pre-Processing Page 

This section describes the events and programs that construct the Web 
page which enables the end user to “fine tune” the Internet search query. Eigure 95 
shows the associated sequence diagram. 
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Figure 95. Web Seareh Pre-Proeessing Page Sequenee Diagram 


In the starting state, the OAKDA sereen shows the Ontology Exploration 
GUI. There is a neeessary preeondition that seareh terms related to the current HTTP 
session are stored in the MySql database. The initiating action for this sequence occurs 
when the end user clicks on the “Search the Web” button on the Ontology Exploration 
page. The ending state shows the user a Web page displaying the search terms in editable 
text boxes with radio button options by each term that enable the user to further configure 
the syntax of the Internet search. 

After the “Search the Web Button” is pressed, an HTTP request is sent to 
the Tomcat server which invokes a Java^M program to fetch records stored in the database 
SEEECTED SEARCH TERMS table. The SQE statement uses the HTTP session ID as 
the key to find the records connected to the session of current OAKDA user. Some post 
processing takes place to reformat the list of terms for the Web search. Since the terms 
are often concatenated together either with characters or by upper/lower case 
transition, programs are invoked to parse the strings from their OWL-DE format into 
their component words and convert them to upper case text. The namespace information 
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on each OWL-DL element is discarded. Table 13 lists examples of typical 
transformations. 


Typical OWL-DL format 


Web Search Format 

1 http://abc.org/owl#Animated_cartoon| 

A 

ANIMATED CARTOON 

1 http://xyz.net/owl#HooverVacuumCleaner| 

A 

HOOVER VACUUM CLEANER 


Table 13. OWL Element Transformations 


This list of search terms is incorporated into the HTML page and sent back to the Tomcat 
server to be returned to the client browser as an HTTP response. 

The displayed page is used to further configure the search terms for the 
Internet query. The form has button controls used for configuring the search with syntax 
used for the Google™ Web portal. The HTML form for this page has editable text 
boxes, radio buttons, and a submit button. The text box contains the search terms 
selected by the user during ontology exploration. These terms may be edited by the user 
to change the word or correct their spelling. The radio buttons are used to control the 
way the terms are used in the Web search. They enable the user to choose one and only 
one of several options. The “AND” radio button choice is the default selection and 
represents a Boolean “AND” for the search. The “OR” radio button implies that the 
search will apply logically “OR” the term with any other term that has the “OR” option 
selected. The “NOT” selection formats the Web search query to find Web pages that do 
not contain the term. The “REMOVE” option ignores the search term so that it will not 
be incorporated into Web search at all. 
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This search will look for pages that have the string apple in the content 
but not COMPUTER 


SEARCH TERM 

AND 

OR 

NOT 

REMOVE 

APPEE 

0 

o 

o 

o 

COMPUTER 

o 

o 

0 

o 


Figure 96. Example Eogical “AND” Search Syntax 


This Search will look for pages that have either APPEE or ORANGE but 
do not contain COMPUTER 

SEARCH TERM 

AND 

OR 

NOT 

REMOVE 

APPLE 

o 

0 

o 

o 

ORANGE 

o 

0 

o 

o 

COMPUTER 

o 

O 

0 

o 


Figure 97. Example Eogical “OR” Search Syntax 


This search will look for pages that contain APPLE or PEAR. 
ARTILERY is ignored and is not present in the Web search 

The term: 

SEARCH TERM 

AND 

OR 

NOT 

REMOVE 

APPLE 

o 

0 

o 

o 

PEAR 

o 

0 

o 

o 

ARTILERY 

o 

O 

o 

0 

Figure 98. Example of Logica 

“OR” and REMOVE Search Syntax 


Figures 96, 97 and 98 above show examples to of how the search 
configuration form assists the end user to configure discovered content into a Web search 
query. Figure 96 shows an example using a logical AND & NOT to retrieve web pages 
that contain “apple” but not “computer.” Figure 97 is similar to 96, except the documents 
retrieved should have either “orange” or “apple” found in the content but not “computer.” 
In Figure 98, the selection of the REMOVE option nullifies the inclusion of the content 
into the query. This means the pages retrieved will have “apple” or “pear” in the content 
but the “artillery” string will have no bearing on the search. The form configuration is 
used as input to be translated into search syntax used in the Google Web services 
interface. 
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g. Google Web Search Page 

This section describes the events and programs that create the Web page 
showing the Internet search links from the Google Web portal. Figure 99 shows the 
associated sequence diagram. 



Figure 99. Google Search Page Sequence Diagram 


In the starting state, OAKDA’s client is displaying the Search Pre 
Processing Page. The action that initiates the process is when the user presses the 
“Search the Web” button on the Search Pre-Processing page. The ending state for this 
action displays an HTML page showing links to Web pages from Google™ Web 
Services that meet the criteria of the Internet search query. 

After the initiating action, the client browser sends an HTTP request 
containing the Web search query in the Web form to the Apache/Tomcat server, which 
invokes the appropriate Java™ program for the next processing step. The program then 
constructs a Web services SOAP XML “envelope” to be sent the Google™ Web Services 
API using the HTTP protocol. Included in the envelope call are the formatted Web 
search parameter string and a registration key required for Google™ authentication. The 

150 














Web service sends back a SOAP response via HTTP that is dispatched by 
Apache/Tomcat server to programs which unwraps its contents. The data contained in 
the response is the very similar to content displayed in a typical Google™ Web site 
search. This data is incorporated into the HTML by the server side Java™ programs. An 
HTTP response containing the HTML document is sent back to the client Web browser. 
The view on the client is similar to a Google search result page. The user can access 
HTML anchors to visit the Web content found by the Google™ search. 

F. CONCLUSION 

The most significant aspects of the OAKDA are the Multi-Tier architecture, 
description logics reasoning services, and GUI visualization service that enables directed 
graph representation of the ontology. The bulk of effort and research for the 
development of OAKDA was spent in these areas. These three contributions to OAKDA 
are discussed in detail below. 

1, Multi-Tier Architecture 

The Multi-Tier architecture enabled an effective prototyping methodology to be 
employed in the development of each system component. As new methods and party 
software were evaluated, adopted or discarded, the Multi-Tier framework kept 
dependencies from effecting adjacent components. The architecture allowed extensive 
re-use of mature tiers through development iterations without the need for modification. 
This enabled tier integration and reintegration to be far less complicated then it could 
have been under a different architecture. Whenever there was a failure to separate tiers 
with a loosely coupled messaging framework, a penalty was paid in code re-writes when 
changes had to be accommodated. 

The level of development effort to bring OAKDA to completion required almost 
10,000 lines of Java™ code in 50 class files. Java™ should be mentioned as significant 
aspect of the architecture. Java^^ was the cement used to integrate OAKDA’s many 
components and was used to implement all business logic. Java’s™ language support for 
networking (TCP/IP), useful utilities such as object serialization, and its ability to run on 
multiple platforms, enabled key design choices to succeed. 
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2 Racer 

When development of OAKDA projeet first stated, the Jena API was evaluated as 
a possible middleware ehoiee for the OWL-DL proeessing. The main reasons for interest 
in Jena were that the API was written in Java™ whieh was to be the language of ehoiee 
for OAKDA’s other eomponents. After some effort to learn the API, it was found that 
Jena eould parse and query OWL-DL effeetively, but its reasoning serviees were diffieult 
to diseem. Jena doeumentation pointed to a eapability for interfaeing with party 
inferenee tools but the meehanisms for doing this were not readily understandable. At 
the time, Raeer was being used as an add-in to the Protege applieation to assist ontology 
development. Raeer provided serviees to assist the developer to identify eonfiiets within 
an ontology and was used by Protege to rearrange taxonomy strueture to a more suitable 
form. It appeared that Raeer eould parse, query and reason with OWL-DL and the only 
obstaele to utilizing it in OAKDA was to find a method to ineorporate it into the 
arehiteeture. Raeer’s TCP/IP based interfaee allowed the software to work with any 
programming languages eapable of forming and sending TCP/IP messages. Raeer’s 
native messaging framework uses a Lisp style syntax to query or publish to the Raeer 
server. Fortunately, a small Java'^’’^ based interfaee for Raeer was already developed, 
ealled JRaeer, whieh abstraeted TCP/IP eommunieations and Raeer’s Lisp style syntax 
into Java funetion calls. Racer is server based software, which lends itself well to the 
Web server based architecture of OAKDA, since both Racer and the Web server need to 
handle simultaneous connections. After this discovery. Racer was selected as a 
component of the project design and the Jena API was abandoned. The incorporation of 
Racer and JRaeer greatly accelerated the development of the system. 

Racer had a certain drawbacks that may not be present with other inferencing 
middleware. For example. Racer has no straightforward method to query the property 
restrictions in class definitions. In OAKDA’s visualization component, it may have been 
useful to show, along with the class name, the restriction statement that defines the 
attributes an individual must possess to be a member of the class. Instead, OAKDA only 
shows these restrictions when class instantiations, or individuals, are selected because the 
property restrictions are manifested as links between individuals. The effect of this on 
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the OAKDA system is that any OWL KB’s without individuals do not reveal property 
restrietions and tend to appear as simple taxonomies. This seems to be a feature of 
Racer’s description logic model. While restrictions can be published when a class is 
defined, it is difficult to query that information from the schema representation inside 
Racer’s memory. The asserted content represented is easier to access. As a result, only 
in asserted content, are the class properties able to be expressed. 

3. Visualization 

It is difficult to visually decipher the patterns and meaning of large matrix of data 
without creating some type of representational image. Graphing techniques exist to help 
comprehend the meaning in a collection of data. OAKDA’s usefulness to the end user 
greatly depended upon providing an effective visualization framework to amplify human 
cognition and navigation support of OWL-DL data. 

OWL-DL’s topology is best described as a directed graph to represent complex 
inheritance, and property relationships between the RDF resource nodes. In OAKDA’s 
development, approaches using HTML/CSS did not yield good results because this 
format provided no easy means to mirror the topology of the OWL-DL documents. 
TouchGraph software is designed to represent directed graphs and thus readily able to 
depict OWL-DL. TouchGraph’s capability to self-organize to fit the available screen 
space ensured the data was distributed evenly on the viewing area and did not obscure the 
content even when it was densely populated. TouchGraph necessitated the development 
of Applets running on the client browser which could communicate with the Web server. 

The system was lacking in organizing the inheritance relationships in a way that 
could display directional relationships between nodes in an overarching pattern. Since 
the OWL representations of the KB contain cycles and bi-directional inheritance, the 
graph cannot be presented as a tree showing all inheritance relationships flowing from the 
top of the visualization area to the bottom. Sometimes this results in a complex jumble 
that has no clear top or bottom organization. However, this aspect of the visualization 
was present in most other techniques considered during the research. This may be due to 
the difficulty of visually displaying semantically rich ontology components and their 
relationships. 
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TouchGraph has an additional software development drawbaek in that it offers no 
API or serviee for importation of data into its system. TouehGraph’s souree eode needed 
to be extensively modified to enable messaging with the OAKDA server, and GUI events 
had to be developed to enable navigation between linked parts of the OWL KB. The 
level of effort it took implement the modifieations used to support OAKDA were not 
trivial. 

Overall, the vision for OAKDA as a tool for a human end user to seareh, proeess 
and navigate OWL-DL ontologies was realized. It hoped that this applieation will be 
used as a tool for those interested in experimenting with the ontologies for knowledge 
diseovery and Web seareh. For those researehing OWL-DL proeessing teehnologies, the 
OAKDA applieation may be a useful as a baseline example. 
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VI. OAKDA VS. GOOGLE COMPARATIVE PERFORMANCE 

STUDY 


A. INTRODUCTION 

This chapter attempts to test and answer the main thesis researeh question: Can 
an ontology-based Web seareh application increase the effectiveness of Web search 
results over existing approaches for those searehes that require a deep eontextual 
knowledge of the domain of interest? This experiment compares the effeetiveness of the 
results of Web search queries formulated by study participants using the OAKDA 
applieation with those obtained by the same participants using the widely popular Google 
seareh engine. The goal: to determine if OAKDA and the ontologies whieh it 
implements will have a significant positive effeet on the precision and relevance of the 
web seareh queries generated by the participants in the experiment. 

Effectiveness will primarily be measured along three dimensions: (1) whether the 
query retrieves pages containing the pertinent answer for the questions asked, i.e. the 
preeision of the results; (2) whether the page retrieved is “abouf ’ the context that relates 
back to the ontology information domain, i.e. the preeision of the eontext; and (3) the 
participant’s own subjective rating on how well OAKDA and Google performed in 
answering the seareh tasks. 

B, EXPERIMENTAL DESIGN 

The experimental design used for the study is deseribed in the following 
subseetions, namely participants, apparatus, and data collection procedures. 

1. Participants 

The participants in this study consisted of 10 adults ranging from 25 to 70 years 
of age. About half of the subjects are information teehnology workers or have some 
involvement in IT an related profession. The rest are employed in other various fields. 
All have, to varying extents, a eollege education and experience in researching topics on 
the Internet. 
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2, Apparatus 

All participants in the study used Mierosoft Internet Explorer and a high speed 
Internet eonneetion. The OAKDA applieation was used as the seareh environment for 
the test group (A full deseription of OAKDA and how it funetions ean be found in 
Chapter 6 of this thesis). The Google Web seareh portal was used as the only seareh 
environment for the control group. 

3, Data Collection Procedures 

The data points eaptured from each study participant were in two main categories: 
personal data and experiment data. The personal data ineluded the partieipant’s name, 
age, and profession. For this study, personal data was not eonsidered in the data analysis. 

Before eondueting the experiment, each subject was trained in the use of OAKDA 
with a sample search task. The training eovered OAKDA’s user interfaee funetions as 
well as reeommendations on how to eonstruet seareh queries. No training was given for 
using Google, but many of the same tips given for OAKDA were applieable for 
produeing effeetive seareh results in Google. The partieipants were assumed to have 
extensive experienee using Google, or Web seareh tools like it. 

Eaeh subjeet answered half of the Web query tasks using OAKDA (Test group) 
and half using Google (Control group). This method ensured that all partieipants took 
part in both the eontrol and test groups an equal number of times. Eaeh Web query was 
more or less equally assigned to the eontrol and test groups 26 . 

For the control group, the study partieipants were allowed as many as three 
attempts to refine their seareh query. They were permitted a maximum of 5 minutes to 
work on eaeh question. During this time, they were allowed to visit the retrieved web 
pages to find information that eould enhanee the subsequent queries. The query term list 
from eaeh attempt was eaptured as a data point. 

The test group was also allowed 5 minutes per questions but they were granted 
more time if they were experieneing teehnieal problems or needed help with applieation 

26 For the rest of the chapter, the sets of control and test data (which are different depending on the 
participant) will be referred to as control and test groups. 
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functionality. No other type of help was given to the test subjects. OAKDA stored all 
submitted queries in a database and only the first three queries for eaeh participant were 
used for the study data. 

The experimental data captured for eaeh partieipant were, for both eontrol and test 
groups, the search query identifier, the number of the attempt, the Web query text, the 
participant’s overall rating of the application performance for all queries, and the 
precision scores as ealculated by the experimenter. One reeord was generated for every 
Web search query. Table 7.1 shows the data definitions for a recorded observation. 


Field Name 

Data 

Type 

Description 

Subjectjd 

Integer 

Unique numeric identifier for person participating 
in the study 

Group_Type 

String 

Identifies data point as being in "TEST" or 
"CONTROL" study group 

Questionjd 

Integer 

Unique numeric identifier for questions posted to 
the participant 

Attempt_# 

Integer 

Search query attempt number (1 ..3) 

Search_Query_String 

String 

Actual Google search term list 

Answer Score 

Number 

Score between (0..1) equal to the number of the 
top 10 web hits generated by the 
Search_Query_String that contained the answer 
to the search task specified by the Questionjd 

Context Score 

Number 

Score between (0..1) equal to the number of the 
top 10 web hits generated by the 
Search_Query_String that contained the correct 
context for the search task’s domain 


Table 14. Definition of Experiment Data 


There were six Web seareh tasks in the study. The participants were instrueted to 
provide the information asked for by the task in the form of a web query that retrieves 
information as relevant as possible to the task’s target answer. The seareh tasks assigned 
to the participants and the associated ontologies are listed in the table 7.2 below. 
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Question 

Id 

Task text 

Ontology File 

1 

Formulate a search query that will retrieve information about 
the geographic features of a tropical Island Nation off the coast 
of the African continent. 

Geography.OWL 

2 

Formulate a Web Search query that retrieves the name of the 
actor(s) who voiced the dog character in the cartoon “Dog 
Trouble” 

Cartoonstar.OWL 

3 

Formulate a search query which will retrieve web pages 
pertaining to at least two French artists who made both 
paintings and sculptures in the style of 19th Century Realism. 

ArtHistory.owl 

4 

Formulate a search query which will retrieve web pages 
pertaining to a computer programming language that that can 
be written in either a "Procedural" or "Object Oriented" 
programming paradigm. This language is a derivative of the 
language used to develop the UNIX operating system. 

Programming 

Languges.OWL 

5 

Formulate a search query which will retrieve web pages 
pertaining to an electric guitar that is the signature model of a 
famous musician. The musician’s first name begins with “L” 
and his career spans from the 1930’s to the present day. 

Guitar.OWL 

6 

Formulate a query that retrieves web pages containing the 
name of a type of golf ball commonly used -150 years before 
the introduction of the modern golf ball and the components 
materials from which it was manufactured. 

Golf.owl 


Table 15. Participant Search Tasks 


After the participants finished creating Web queries for all the search tasks, they 
were asked to rate subjectively the effectiveness of both Google’s and OAKDA’s 
searches on a scale between 0 and 10. This data was intended to represent the 
participant’s perceived measure of OAKDA’s usefulness compared to Google’s, and 
represents the “Participant Rating” score of the experiment data. 

Recall performance was not considered for the study since the data analyzed is 
“pulled” from a Web search portal, the total set of relevant data on the Web was too large 
to be efficiently measured. The precision scores were calculated by checking the 
relevance of the top n number of retrieved Web pages from any given query. In general, 
the precision was defined as the ratio of the number of relevant hits retrieved to the total 
number of examined hits. The statistic was expressed as a percentage. The queries 
recorded for each participant were given a rating score in two different analysis 
categories. 


158 




a. The “Answer Precision” Score 

Each query result was rated between 0 and 1 (0..1) for the presenee of the 
“answer(s)” to the seareh task in the text of the retrieved web hits. The examination of 
the Web pages was aeeomplished with a Java™ program whieh downloaded the pages 
and used regular expression proeessing to find the answer text within the page eontent. 
The top 20 Web hits generated by eaeh query were examined. For eaeh query, the 
answer preeision seore was determined by ealeulating the number of Web sites with the 
answer text divided by the total number of examined sites. A seore of .75 meant the 
answer text was present in 15 out of the top 20 Web pages retrieved by the query. 

b. The “Context Precision” Score 

Similarly, eaeh of the top ten retrieved Web pages was examined to assess 
if they had the correet eontextual baekground pertaining to the seareh task. For example, 
if the seareh task pertains to the domain of golf equipment, and the retrieved web pages 
are about Gutta Pereha tree agrieulture, then the page was eonsidered to be contextually 
ineorrect. The determination of whether a given page was eontextually aecurate was 
aehieved by a visual analysis. The eontext precision score was ealeulated as the ratio of 
the number of pages that had the correet eontext for the search task. 

The seores for eaeh query formed the raw data used for the statistieal 
analysis. As stated earlier, eaeh partieipant answered half (three) of the seareh tasks with 
Google and the other half with OAKDA. The ten partieipants in the study produeed a 
total of 125 Web seareh queries in answering six Web seareh tasks (eaeh was allowed up 
to three attempts). Therefore, each participant produced, on average, 2.08 queries out of 
a possible 3 for eaeh test and eontrol data set. Only the best performing query was kept 
for eaeh seareh task answered, produeing 30 total observations for the test group and 30 
for the eontrol group. Eaeh partieipant’s eontrol and test group seores were averaged, 
leaving a total of ten observations for eaeh experiment group, or 1 observation per 
partieipant. 

This data summary was performed for both the “Answer” and “Context” 
data sets, leaving three total data sets for analysis: Partieipant Rating seore, “Answer” 
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precision score and the “Context” precision score. The next section will describe the type 
of statistical analysis used to measure the data and the findings. 

C. DATA ANALYSIS PROCEDURES 

The methodology followed in the experiment yielded three data sets to be used as 
inputs for the statistical analysis. Table 7.3 below depicts the performance scores for the 
“Participant Rating,” “ Ans wer Precision,” and “Context Precision” data sets. 



Participant Rating 

Answer Precision 

Context Precision 

Participant iD 

Googie 

OAKDA 

Googie 

OAKDA 

Googie 

OAKDA 

1 




0.850 

0.600 

0.900 

2 

0.7 

0.9 


0.567 

0.700 

0.800 

3 

0.7 

0.75 

0.650 

0.900 

0.667 

0.567 

4 

0.4 

0.6 

0.367 

0.550 

0.400 

0.667 

5 

0.6 

0.9 

0.583 

0.817 

0.733 

0.733 

6 

0.65 

0.9 

0.550 

0.833 

0.800 

0.767 

7 

0.4 

0.8 

0.400 

0.867 

0.633 

0.600 

8 

0.3 

0.8 

0.383 

0.600 

0.667 

0.833 

9 

0.7 

0.7 

0.500 

0.833 

0.633 

0.533 

10 

0.4 

0.8 

0.300 

0.783 

0.433 

0.167 

Mean score 

0.545 

0.785 

0.498 

0.760 

0.627 

0.657 


Table 


6 . 


Experiment Data Results 


The objective for the analysis of the data sets was to see if there is a significant 
difference between score means of the test and control data sets. The method used to test 
for significance was the t test dependant means. The t test involves a comparison of 
means from two different groups and focuses on the differences between the scores using 
the following formula: 

,= __ 

V n-1 

where, 

ZD is the sum of all differences between groups 

ZD is the sum of all differences squared between groups 

n is the number of pairs of observations. 
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1, t Tests for the “Participant Rating” 

This section shows the eight steps used to compute the t test statistic. To reiterate, 
the “Participant Rating” scores consist of the participants’ rating scores of the perceived 
effectiveness of OAKDA and Google in retrieving relevant Web pages to answer the 
search tasks 

a. Statement of the Null and Research Hypothesis 

The null hypothesis states that there is no difference in the participant 
rating between effectiveness score means of Google and the OAKDA searches. The 
research hypothesis is that the participants rate the OAKDA searches as more effective 
than Google searches. The research hypothesis is a one-tailed, directional research 
hypothesis because it posits that the OAKDA score will be higher than the Google score. 


The null hypothesis is: = p^akda • 


The research hypothesis is: H, : . 

b. Set the Level of Significance or Type I Error Associated with the 
Null Hypothesis 

The risk of Type I error or level of significance is 0.05. 

c. Select the Appropriate Test Statistic 

The appropriate test statistic is a t test for dependent means, also known as 
the t test for paired samples, since we are dealing with a group of scores for the same 
participants. 

d. Compute the Obtained Value 

The obtained value for t is: 


t 


ZD 

tiTD^-iTpy 

n-\ 


4.65 


e. Determine the Value Needed for the Rejection of the Null 
Hypothesis 

The degrees of freedom are n -1 or 9. Using this value and appropriate t- 
value tables [Salkind, 2004, 358], the value needed for rejection of the null hypothesis at 
the 0.05 significance level is 1.833. 
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/. Compare the Obtained Value to the Critical Value 

The obtained t value is 4.65, larger than the eritieal value of 1.833 needed 
for rejeetion of the null hypothesis. 

g. Decision Time 

Sinee the obtained value is greater than the eritieal value, the null 
hypothesis is rejeeted. This indieates that users rate the effeetiveness of OAKDA 
searehes as, indeed, higher than that of a Google seareh for the experiment query set. 

2, t Test for the “Answer Precision” 

This section shows the eight steps used to compute the t test statistic. To reiterate, 
the “Answer Precision” data set consists of the precision scores of retrieved web pages, 
measured by the number of sites containing the answer text of the search task. 

a. Statement of the Null and Research Hypothesis 

The null hypothesis states that there is no difference between the answer 
precision scores of Google as compared with OAKDA. The research hypothesis is that 
OAKDA search results contain the “answer” to the research task more frequently than 
Google searches. The research hypothesis is a one-tailed, directional research hypothesis 
because it posits that the OAKDA precision will be higher than Google. 

The null hypothesis is: • 

The research hypothesis is: . 

h. Set the Level of Significance or Type I Error Associated with the 
Null Hypothesis 

The risk of Type I error or level of significance is 0.05. 

c. Select the Appropriate Test Statistic 

The appropriate test statistic is a t test for dependent means, also known as 
the t test for paired samples, since we are dealing with a group of scores for the same 
participants. 

d. Compute the Obtained Value 

The obtained value for t is: 
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t 


TD 

nTD^-{TDf 

n-\ 


= 3.86 


e. Determine the Value Needed for the Rejection of the Null 
Hypothesis 

The degrees of freedom are n -1 or 9. Using this value and appropriate t- 
value tables, the value needed for rejection of the null hypothesis at the 0.05 significance 
level is 1.833. 

f Compare the Obtained Value to the Critical Value 
The obtained t-value is 3.86, larger than the critical value of 1.833 needed 
for rejection of the null hypothesis. 

g. Decision Time 

Since the obtained value is greater than the critical value, the null 
hypothesis is rejected; indicating that according to the measure of retrieved Web pages 
containing the search task answer, the precision of OAKDA is higher than Google results 
for the experiment query set. 

3. t Test for the “Context Precision” 

This section shows the eight steps used to compute the t test statistic. To reiterate, 
the “Context Precision” data set consists of the precision scores of retrieved web pages, 
measured by sites containing the correct context of Web page content pertinent to the 
search task. 

a. Statement of the Null and Research Hypothesis 

The null hypothesis states that there is no difference between the 
“Context” precision scores derived from Google as compared with OAKDA. The 
research hypothesis is that OAKDA search results contain the correct research task 
context more often than Google searches. The research hypothesis is a one-tailed, 
directional research hypothesis because it posits that the OAKDA score will be higher 
than the Google score. 


The null hypothesis is: • 


The research hypothesis is: 


> X 


OAKDA • 
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b. Set the Level of Significance or Type I Error Associated with the 
Null Hypothesis 

The risk of Type I error or level of signifioanee is 0.05. 

c. Select the Appropriate Test Statistic 

The appropriate test statistie is a t test for dependent means, also known as 
the t test for paired samples, sinee we are dealing with a group of seores for the same 
partieipants. 

d. Compute the Obtained Value 

The obtained value for t is: 


t 


TD 

nTD^-{TDf 

n-\ 


0.54 


e. Determine the Value Needed for the Rejection of the Null 
Hypothesis 

The degrees of freedom are n -1 or 9. Using this value and appropriate t- 
value tables, the value needed for rejeetion of the null hypothesis at the 0.05 signifieanee 
level is 1.833. 


f Compare the Obtained Value to the Critical Value 
The obtained value is 0.54, smaller than the eritieal value of 1.833 needed 
for rejeetion of the null hypothesis. 

g. Decision Time 

Sinee the obtained value is less than the eritieal value, the null hypothesis 
is upheld, indieating that the effeetiveness of OAKDA was not shown to be higher than 
Google results for the experiment query set. 


D, DISCUSSION 

The strongest performanee indieator favoring OAKDA over Google was the 
partieipants own subjeetive rating. This may be due to test group partieipants feeling that 
they had more eonfirmation of the answer’s eorreetness. Sinee the ontologies visually 
show relationships between domain eoneepts, the partieipants got affirmation more 
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quickly compared to reading through the prose contained in Web pages and “snippets” 
retrieved by Google. 

The “Answer Precision” scores show OAKDA users performing better than those 
using Google as far as retrieving the answer text to the search question. They do not 
however capture the confidence level of whether the user actually believed they found the 
eorrect answer. Several query results with a high precision score surprisingly did not 
contain the answer text in the formulated seareh term list. Rather, by using the closely 
related, but not necessarily correet, terms in the web query, the search retrieved highly 
relevant hits. This result indieates that using the domain eoneepts which are elosely 
related to the sought after term is a good method of formulating a Web seareh query. In 
this study it was not known how many of these high performing queries were 
serendipitous. 

The “Context Preeision” data shows that OAKDA did not deliver a performance 
advantage over the Google users. This result is not surprising since the keywords in the 
text of the search tasks contained broad contextual information about the knowledge 
domain. This was a conseious design choice to help the participants easily loeate the 
appropriate ontology for the search task. The contextual information given away in the 
seareh task allowed the Google group to create queries that performed as well as the 
OAKDA generated queries in terms of eontextual aeeuracy. 

This results of the study demonstrate that an ontology assisted seareh application 
does indeed increase the effeetiveness of the obtained results for those queries that 
require a deep domain knowledge, and provided that: (1) the seareh task’s domain 
eorrelates strongly with an existing and available ontology and (2) the information the 
researeher starts with is suffieient to retrieve the helping ontology but is insuffieient to 
retrieve the sought after answer. 

In the study, every effort was made to craft search tasks that comply with the 
prerequisite eonditions stated above. First, seareh tasks were complex enough so as to 
not “give away” important information that would enable an easy path to the answer via a 
standard Web search. Rather, the information eontained in the seareh task was separated 
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by at least two degrees from the target answer. As an example, eonsider the following 
query: “Find the name of the brother of Charlie Brown’s Psyehoanalyst”. Using the 
Peanuts eartoon domain, the answer, “Linus”, is two degrees away from “Charlie 
Brown”, who is the brother of Luey, who in turn is Charlie Brown’s psyehoanalyst. This 
is an example of a query that requires somewhat deep domain knowledge. Seeond, the 
test users were presumed to have an advantage as long as they managed to loeate the 
ontology eorreetly mapped to the seareh task. Having done this sueeessfully, they only 
needed to traverse the graph nodes from a starting point in the ontology to those nodes 
pointing to the answer. Correetly interpreting the meaning of seareh task and ontology 
eoneepts was also neeessary for suceess. There were no instanees where the test group 
had diffieulty finding and seleeting the appropriate ontology amongst pool of available 
OWL ontologies. If they had not, the retrieval preeision scores would have been lower 
and much more uneven among the among the control group observations. 

It is important to note, that some aspects of the study did create experimental 
noise in the data. The knowledge domains represented in ontologies could have been 
well known to some of the participants. The group scores might have had greater 
contrast if those participants selected had little or no knowledge in the domains selected 
for the search tasks. Also, while every participant answered an equal number of 
questions from test and control a group, not every question was visited an equal number 
of times. It is possible that this caused some skewing of the results. Furthermore, 
participant performance varied based on how closely they read and understood the search 
tasks as well as their skill at using and comprehending the OAKDA user interface. 

Taken together, the significant means of the Rating scores and Answer precision 
imply that an ontology assisted search application can make a positive contribution to a 
researcher’s search effectiveness. 
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VII. SUMMARY, CONCLUSIONS AND LESSONS LEARNED 


A, SUMMARY 

The stated goal of this thesis was to build an ontology-based Web applieation to 
assist in domain knowledge diseovery and improve Web seareh query by narrowing the 
seope of the returned list of results. To this end, the primary researeh question was, Can 
an ontology-based application be built to narrow, expand, or refine Web search terms ? 

In addition, several seeondary researeh questions follow: 

1. What is an appropriate approach for accessing and processing of contextual 
information of an OWL knowledge base? 

2. What is the most appropriate architecture for the prototype application? 

3. How can an ontology inference engine interface with the application ? 

4. Is there a method of visually rendering the ontologies for greater usability, 
navigation and comprehension of domain knowledge? 

This thesis addressed these questions using a two-step methodology. The first step was 

to researeh and make a ease for the value of ontologies as a knowledge representation 

system that models the real world domain into elasses, instanees and the relationships 

between them. A thorough review of RDF and OWL provided the meehanism for 

understanding OWL-DL semanties and how they ean be used to define the eomponents 

of an ontology. It was neeessary to fully eomprehend the OWL eonstruets in order to 

overeome the eommon mistakes and pitfalls of ontology development. As part of the 

ontology development proeess, a methodology for developing OWL-DL ontologies was 

proposed and demonstrated by the eonstruetion of a sample ontology in the geography 

domain. 

The seeond step was to design, develop and test an ontology-based Web 
applieation, ealled OAKDA, to allow users to seareh the ontology library and traverse the 
ontology graph to diseover domain knowledge for finding relevant Web seareh terms. 
When OAKDA was eompleted, an experiment was eondueted to test whether the system 
eould improve the preeision of Web seareh eompared to using standard Web portal 
searehing sites. 
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B, EVALUATION 

The first major milestone of this thesis was the development of the sample 
ontologies. A signifieant effort was spent understanding the use of ontologies and 
learning RDF and OWL semanties. This knowledge was instrumental in proposing a 
development methodology that enabled the eonstruetion of a valid geography domain 
ontology. The final ontology was verified with Raeer, an ontology infereneing engine, 
eliminating elassifieation errors and ineonsisteneies. The learned development methods 
were later used to ereate the ontologies used to eonduet experiments to measure 
OAKDA’s performanee in aiding Web search. 

The second milestone was the overall design and execution of the OAKDA 
application. Bringing OAKDA from conceptualization to implementation involved 
finding workable solutions for key data processing areas, namely Description Logics 
reasoning, ontology query, and an intuitive GUI visualization framework to facilitate 
human interaction with the application. The selection of a multi-tier architecture 
contributed significantly to the success of the development effort. It enabled an effective 
prototyping methodology used throughout the development cycle of the application. It 
encouraged re-use of tier components as changes were made. The choice of Racer 
middleware was well suited as a platform for reasoning and querying of OWL-DL 
ontologies. Racer performed well as an ontology infereneing engine and provisioned an 
extensive language for query. Racer’s reasoning service was capable of calculating 
implicit role relationships, reclassifying individuals based on their property restrictions, 
and correctly determined subsumption relationships between classes. Implementing 
TouchGraph software for visualization and navigation for OAKDA’s GUI interface made 
an important contribution its usability from a user perspective. 

The outcome of experiments conducted to test search performance between 
groups of people using OAKDA compared with Google showed that ontology aided 
search can confer an advantage. A performance comparison of OAKDA and Google 
showed a statistically significant and positive difference in the test group (OAKDA) 
results when performing certain research tasks. While the results showed the overall 
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contextual accuracy of retrieved web pages between the test and control groups were not 
significantly different, the test group performed better in retrieving the specific target 
information requested by the research task. Also, the participant survey showed that the 
test group rated OAKDA as more effective than Google. This implies that OAKDA 
users received more confirmation, or had a greater level of confidence, that they 
discovered the data they were tasked to retrieve. 

It seems that OAKDA’s effectiveness in Web research depends upon two factors 
- “aboutness”27 and “availability” of the ontology information in its database. First, the 
degree of the ontology's aboutness to the domain of interest determines the relevancy of 
the knowledge found in the ontology. Terms with a high degree of aboutness do a better 
job of retrieving the body of information they reference. Although measuring aboutness 
of retrieved OWL-DL data is not in the scope of this project, an ontology is meant to be 
definitional or descriptive of a domain and should be useful in aiding web search as long 
as the ontology is not misleading in its content. Second, OAKDA relies on a 
presumption that an ontology exists in the users area of interest and that the ontology 
contains knowledge the researcher does not yet posses. 


C. LESSONS LEARNED 

In completing this thesis, several lessons were learned. The first was the 
realization that ontology development is a non trivial endeavor. Developing an ontology 
requires domain expert knowledge in addition to a complete understanding of the 
ontology language syntax and semantics. Although the level of domain knowledge 
required is contingent on the scope and level of detail of the ontology, having an in-depth 
knowledge of the domain concepts and their relationships allows the developer to 
accurately capture the semantics and relationships between classes. Due to the inherent 
level of latitude in representing type/class hierarchy in concept modeling, no two subject- 
matter experts (SMEs) will design an ontology of a given domain in an identical manner. 
Depending on the domain of context, there is usually a significant degree of subjectivity 

27 Aboutness is broadly defined as a degree in whieh a set of returned resourees is “about” a partieular 
domain of interest. For instanee, “if a system determines that a doeument d is topieally related (i.e. about) 
to query q, then the doeument is returned to the user.” (Bruza et ah, 2000, 1) 
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in specifying class definitions and relationships, with no absolute superiority of one 
particular schema over another. Also, inconsistencies are easily introduced when 
creating a class hierarchy of an ontology. Development tools, such as Protege and Racer, 
help mitigate these errors by providing services to verify the validity and consistency of 
the ontology. By performing consistency checks on the ontology at various stages of its 
development, it was possible to find and correct classification errors or inconsistencies in 
class definitions. 

A second lesson was learned during the experiment conducted to measure the 
effectiveness of the OAKDA application. The Swoogle28 project, funded by the 
University of Maryland, Baltimore, is a vast searchable repository of indexed OWL and 
RDF ontologies. Swoogle uses automated Web crawling techniques similar to those 
employed by the major Web search portals to find available ontologies out on the Web. 
Swoogle differs from other search portals in that it only seeks RDF and OWL Web 
content. Swoogle was searched extensively for OWL ontologies to be used for the 
experiment, but none found to suitable because of their structural attributes, content or 
both. Instead, those ontologies had to be developed by the thesis authors. It may be that 
in some circumstances, ontologies need to be structured in particular ways suited to how 
they are being used. 

D. FUTURE WORK 

Currently the OWL database in OAKDA is too limited to provide usability 
outside of the bounds of experiment. In order for an application like OAKDA to have 
wide popularity as a research tool, there would need to be a repository of appropriately 
structured ontology files with an encyclopedic breadth. This scenario is at this time not 
likely to come about. OAKDA represents a departure from many other proposed uses of 
ontology data in that much previous work has been to try to leverage ontologies to 
enhance machine capabilities. OAKDA seeks to enhance the research facility of the 
human user. Future efforts that would be able to draw from the technologies exhibited in 
OAKDA might be graphical visualizations of ontologies geared to aid navigation to 

28 http://swoogle.umbc.edu - The first three letters of Swoogle stand for Semantic Web Ontology 
while the rest of the name is meant to refer to a popular Web search portal. 
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related information reeomposed as an ontology. For example, suppose someone 
developed an ontology about a library eatalog. A web site of the library’s eatalog eould 
show a navigation graph depleting the kinds of relationships between related referenee 
materials. Another usage might be based on an ontology that deseribes the relationships 
between elass fdes, funetion points, variables, ete in a grouping of programming eode. A 
visualization of this ontology eould give a team of developers an easy way to navigate the 
strueture of program and quiekly learn its topology. Beeause of the regularity of 
computer programming syntax, the ontology might be able to be regenerated from the 
code base as changes are made. 

E, CONCLUSIONS 

OAKDA was built as project to explore the methodologies for accessing and 
processing OWL-DL ontologies in a framework which attempts to leverage domain data 
to enhance Web search queries. OAKDA successfully integrated several components 
capable of performing these operations. OAKDA’s architecture demonstrates one of a 
few possible frameworks for attaining a Semantic Web enabled application. OAKDA 
overcame challenges in implementing a system to perform both machine processing and 
human interaction with Semantic data. OAKDA demonstrates the feasibility of these 
technologies and could serve as a baseline for other types of Semantic Web applications 
that require similar functionality. 

The central assertion of Semantic Web is to provide a universal medium for the 
exchange of data where data can be shared and processed by automated tools as well as 
by people. This thesis showed the value of ontologies as a system for human-processable 
knowledge representation. Through applications like OAKDA, that employ OWL-DL 
ontologies for semantic processing, communities in every field of interest should be 
encouraged to capture their knowledge by developing ontologies. It is our hope that the 
research conducted here may suggest future methodologies for the realm of applications 
leveraging knowledge representation techniques. 
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APPENDIX 


A. ACRONYM TABLE 


Acronym/Term 

Description 

API 

Application Programming Interface 

BOT 

Software Robot 

CSS 

Caseading Style Sheets 

DBMS 

Database Management System 

DL 

Deseription Logies 

GUI 

Graphieal User Interfaee 

HTML 

Hypertext Markup Language 

HTTP 

Hypertext Transfer Protoeol 

IP 

Internet Protoeol 

JDBC 

Java^M Database Conneetivity 

JRE 

Java™ Runtime Environment 

KB 

Knowledge Base 

OAKDA 

Ontology Aided Knowledge Diseovery Applieation 

ODBC 

Open Database Conneetivity 

OWL-DL 

Web Ontology Language- Deseription Logies 

SOAP 

Simple Objeet Aeeess Protoeol 

SQL 

Strueture Query Language 

TCP/IP 

Transfer Control Protoeol / Internet Protoeol 

UDDI 

Universal Diseovery Deseription Integration 

Wiki 

A website or similar online resouree whieh allows users 


to add and edit eontent eolleetively 

WSDL 

Web Serviee Deseription Language 

XML 

extensible Markup Language 
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B. JAVA EXAMPLE CODE - “INTERFACING WITH THE RACER 
SERVER” 


// instanciate new racer client instance 

RacerClient client = new RacerClient( "10.33.10.19", // ip address 

8000) ; // tcp port 

// prepare read command for racer 

String racerCommand = "(owl-read-file \"" + "geography.owl" + "\")"; 

// try block for racer client commands 

try 

{ 

// execute read racer file 

client. synchronousSend( racerCommand); 

} 

// catch io exceptions 
catch (lOException ioe) 

{ 

System.out.printin( "error during read of file." + ioe.getKessage()) ; 

} 

// catch racer exception 
catch (RacerException re) 

{ 

System.out. println("err or during read of file." + re. getKessage()) ; 

} 


Figure 1; Java code used to execute commands for Racer to read OWL-DL file 
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// instanciate new racer client instance 

RacerClient client = new RacerClient( "10. 33 .10.19", // ip address 

8000) ; // tcp port 

// prepare read command for racer 

String racerCommand = "(owl-read-file \"" + "geography.owl" + 

// try block 

try 

{ 

// execute the racer read file command 
client.synchronousSend(racerCommand); 

// set string values used in racer queries 

String individual "georgia"; // set node name of individual 

String propertyName = "sharesBorderWith"; // set node name of propery 
String className = "state"; // set class node name 

// get the name of Racers current A-Box or name of asserted content 
// in the current ontology 
String aBox = client.currentftBox(); 

// query racer to get the list most specific class names from which 
// "georgia" was instanciated 

String[] classes = client.mostSpecificinstantiators( individual, 

aBox, 

); 

II query racer retrieve all indivduals instanciated from the "state" class 
String[] individuals = client.retrieveConceptlnstances( classHame, 

aBox, 

); 

// query racer to get the list of individuals that 
// that share a border with "georgia" 

String[] RangeIndividuals = client.retrieveIndividualFillers( individal, 

propertyName, 

aBox 

); 

} 

II catch exception 
catch (lOException ioe) 

{ 

System.out.printin( "error during read of file." + ioe.getMessage()) ; 

} 

II catch exception 
catch (RacerException re) 

{ 

System.out.printin( "error during read of file." + re.getltessage()) ; 

} 


Figure 2; Java code used to execute commands for Racer to read OWL-DL file and 
perform various queries which retrieve data in Java typed language structures. 
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